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I, Staffan Johansson, do hereby make the following declaration: 

1 . I am a Professor of Medical Cell Biology at the University of Uppsala. 

2. My curriculum vitae is provided as Appendix 1 . 

3. I have no personal interest in the outcome of the prosecution of US Patent 
Application No. 09/980,403. 

4. I have actively worked in the field of integrin research for the last 20 years, 

5. I have been asked to comment on the following publications with respect 

to their reception in the field of integrin research at the time of their publication and their 
relevance to the field: 
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Gullberg et al. a Dev. Dyn. 204:57-65 (1995) 
Veiling et al. v J. Biol. Chem. 274:25735-42 (1999) 
I have read and understood these publications. 

6. The field of integrin research, and specifically integrin gene cloning, was 
one of the most competitive fields in biomedical research in the 1990's. The integrins 
were found to be fundamental for the function of all cells in the body, as shown by the 
range of defects resulting from mutations or deletions in the different integrin genes. 
These defects include bleeding disorders, inflammatory disorders, cancer metastasis, 
lack of immune response, skin detachment (epidermis pilosa), and muscle dystrophy 1 . 
Therefore, the identification and cloning of each integrin subunit was considered a major 
event in biomedical research, which in many cases opened up a whole new area of 
investigation. 

7. New integrin subunits were identified and cloned at a rapid pace in the 
early 1990's, mostly by techniques that exploited the regions of homology between 
already known integrin subunits, but by 1995 the discovery of new integrin subunits had 
slowed dramatically. My assessment in 1995 and the subsequent years was that 
possibly all integrin subunits had been identified and cloned. This was also a general 
view in the integrin research community 2 . This belief was based on the sharp drop in the 
rate of discovery of new integrin subunits in spite of improved knowledge, reagents, and 
methods, such as advanced knowledge of conserved sequences and domains of 

1 See, e.g., Hynes, R.O., Integrins: versatility, modulation, and signaling in cell adhesion. Cell. 
1992 Apr 3;69(1):1 1-25. AH references cited in this declaration are provided as Appendix 2. 
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integrins, the availability of PCR technology and integrin specific antibodies, and 
improved knowledge of the conditions for co-immunoprecipitation and affinity 
chromatography of integrins. In addition, reports either in publications 3 , scientific 
meetings or informal discussions with colleagues - indicating the presence of 
additional, yet unidentified integrins had ceased. 

8. The only exception I knew of at that time was Donald Gullberg and his 
publication of certain bands in polyacrylamide gels, which he felt may represent an 
integrin a subunit which did not correlate to any of the known integrins. Based on these 
gel bands Gullberg et al postulated the existence of a new integrin a subunit which they 
termed integrin amt. See Gullberg et al., Dev. Dyn. 204:57-65 (1995). 

9. Gullberg's report was met with skepticism in the integrin research 
community, including myself. As explained in 7., the general belief was that possibly all 
integrins had been identified and cloned at that point. Gullberg's proposal of an 
additional, yet unidentified integrin a chain based solely on biochemical experiments, 
but without providing any sequence of the postulated molecule, seemed to be 
scientifically weak. The complexity of the integrin family was well recognized at that 
time 4 , and Gullberg simply did not provide any evidence that would have distinguished 
his gel bands from known integrins or from alternative splicing or glycosylation products 
of known integrins. Actually, Gullberg did not provide any hard evidence that his gel 

2 See, e.g., Hynes, R.O., Cell adhesion: old and new questions. Trends Cell Biol. 1999 
Dec;9(12):M33-7. 

3 See, e.g., Hynes, R.O., Cell adhesion: old and new questions. Trends Cell Biol. 1999 
Dec;9(12);M33-7. 
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bands correlated to any integrin at all. There was only circumstantial evidence in the 
molecular size and the co-immunoprecipitation with integrin (31. 

10. Since the discovery of the first integrins, odd reports of a integrin-like" 
genes or proteins from various sources had appeared regularly. Such reports related to 
"integrins" in plants and yeast, single chain "integrins", and truncated forms of integrins 5 . 
In many cases these reports turned out not to describe true integrins. For example, it is 
now known that integrins do not exist in plants and yeast 6 . There was a general 
suspicion in the research community towards these odd reports and in the absence of 
convincing evidence the experts in the field had a healthy critical view towards any 
claims of a new integrin in the absence of sequence data. I have experienced this 
general scepticism myself when my initial attempts to publish biochemical data on the 
isolation of a new integrin a chain were rejected by the peer reviewers of two different 
journals. In the absence of complete sequence data my peers simply were 
unconvinced that we had actually identified a novel integrin a subunit 7 . Only after 
obtaining more sequence information we were able to publish the characterization of the 
novel integrin subunit 


4 See, e.g., Hynes, R.O., Integrins: versatility, modulation, and signaling in cell adhesion. Cell. 
1992 Apr 3;69(1):1 1-25; Loftus, J.C., et aL, Integrin-mediated cell adhesion: the extracellular 
face. J Biol Chem. 1994 Oct 14;269(41):25235-8. 

5 See, e.g., Gale, C, et aL, Cloning and expression of a gene encoding an integrin-like protein in 
Candida albicans. Proc Natl Acad Sci USA. 1996 Jan 9;93(t):357-61; Berg, R.W., et al.. Cloning and 
characterization of a novel beta integrin-related cDNA coding for the protein TIED ("ten beta integrin EGF- 
like repeat domains") that maps to chromosome band 13q33: A divergent stand-alone integrin stalk 
structure. Genomics. 1999 Mar 1;56(2): 169-78; Laval, V., et al., A family of Arabidopsis plasma 
membrane receptors presenting animal beta-integrin domains. Biochim Biophys Acta. 1999 Nov 
16;1435(1-2):61-70. 

6 See, e.g., Rubin, G.M., et aL, Comparative genomics of the eukaryotes. Science. 2000 Mar 
24;287<5461):2204-15. 

7 An example of a relevant communication by a peer reviewer is provided as Appendix 3. 
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a9 8 . This general suspicion and scepticism also applied to Gullberg's report of integrin 
amt, even though he somehow managed to get his biochemical data published. 

11 . The obvious way to clone Gullberg's postulated novel integrin for one of 
ordinary skill in the art was to take G6 muscle cells as a source material for construction 
of a library (cDNA or expression library), for isolation of mRNA for PCR amplification, or 
for affinity purification of the target protein for amino terminal sequencing. One could 
also use the cells to raise antibodies against cell surface molecules, including integrins, 
and select antibodies based on their ability to block adhesion of G6 cells to extracellular 
matrix proteins. Such antibodies could then be used for affinity purification of the target 
protein or for screening of an expression library. Probes for cDNA library screening and 
primers for PCR amplification would have been designed based on the homology 
regions of known integrin subunits. However, to my best knowledge no one, including 
Gullberg himself, succeeded in cloning the postulated new integrin amt by any of these 
methods. After having failed to clone the postulated integrin amt by any of these 
methods one of ordinary skill in the art would have concluded that most likely there was 
no novel integrin present in these muscle cells. This person would have had no good 
reason to experiment randomly with other source materials since no evidence for the 
expression of the postulated integrin amt in other cell types existed and the expectation 
of success would have been dramatically reduced because of the known cell type-, 
differentiation state-, and developmental stage-specific expression and regulation 


8 Forsberg, E, et al M Purification and characterization of integrin alpha 9 beta 1. Exp Cell Res. 
1994 Jul;213(1):183-90. 
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patterns of integrins 9 . Thus, there was no good reason to try to clone integrin amt from 
uterus tissue. 

12. The Veiling et al report in 1 999 of the cloning of the new integrin a1 1 and 
its correlation to the previously reported integrin amt came as a surprise to the field, for 
all the reasons outlined above. However, despite the statement by Veiling et al that the 
newly cloned integrin a1 1 correlates to the integrin amt of Gullberg et al this has never 
been clearly demonstrated. The correlation is based purely on a few biophysical 
properties (see Veiling et al.) but not on any sequence identity. Gullberg's integrin amt 
has to the best of my knowledge never been cloned from myotubes, thereby precluding 
any sequence comparison. Hence, whether integrin amt identified by Gullberg et al is 
identical to integrin a1 1 identified by Veiling et al remains unknown to this day. 

13. As explained in 6., the identification and cloning of each integrin subunit 
was considered a major event in biomedical research, which in many cases opened a 
whole new area of investigation after identification of each unique integrin function. 
Examples are alip3 (expressed on thrombocytes, important regulatory role in blood 
coagulation), aL02 (role in the immune system, deficiency gives LAD, leucocyte 
adhesion deficiency), a6(54 (expressed on hemidesmosomes, deficiency gives 
epidermolysis bullosa), aVp6 (regulates TGFbeta, important regulatory role for the 
immune system and matrix turnover 10 ). 


9 See, e.g., Hynes, R.O., Integrins: versatility, modulation, and signaling in cell adhesion. Cell. 
1992 Apr 3;69(1):1 1-25. 

10 See, e.g., Munger, J.S., et al., The integrin alpha v beta 6 binds and activates latent TGF 
beta 1: a mechanism for regulating pulmonary inflammation and fibrosis. Cell. 1999 Feb 
5;96(3):319-28. 
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14. I further declare that all statements made herein of my own knowledge are 
true and that all statements made on information and belief are believed to be true, and 
further, that these statements were made with the knowledge that willful false 
statements and the like so made are punishable by fine or imprisonment, or both, under 
Section 1001 of Title 18 of the United States Code, and that such willful false 
statements may jeopardize the validity of the application or any patent issuing thereon. 

Dated: January 16, 2008 By: 
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OF 
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BY 

STAFFAN JOHANSSON 


Content: 

Curriculum Vitae of Staffan Johansson (1 0 pages) 
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Affiliation Dept. of Medical Biochemistry and Microbiology, Uppsala Univ. 
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Positions Professor in medical cell biology, Univ. of Uppsala, 000101-. 

Assoc. professor in medical cell biology, Univ. of Uppsala, 990302-991231. 

Assoc. professor in the field of "Zoological cell biology", defrayed by 
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Chem., Univ. of Uppsala, 921201-990228. 

Leave of absence (taking care of children), 50% of 11 months during the 
period 941121-951231. 

Various positions at the Dept. of Med. and Physiol. Chem., Univ. of 
Uppsala, 830901-921130. 

Visiting research associate at the Connective Tissue Laboratory, 
University of Alabama in Birmingham, USA, 791203-810518. 

PhD position, 780701-830831. 

Other assignments 

Principal PhD supervisor for: 

Erik Forsberg (PhD exam 931203), 
Peter McCourt (PhD exam 990226) 
Krister Wennerberg (PhD exam 990416) 
Gunbjorg Svineng (PhD exam 991216) 
Lars Lohikangas, (PhD exam 000530) 
Annika Armulik (PhD exam 001103) 


Stina Nilsson (PhD exam 060505) 
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Associated Postdoctoral fellows: 
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Nancy Martin (1989-1992) 
Paola Longati (2001/2002) 
Jian-He Wu (2002/2003) 
Sophie Johansson 2001-2003 
Teet Veiling (2002-2004) 
Nathalie Bot (2006/2007 
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J. Cell Biol., Mol. Cell Biol., J. Biol. Chem., Oncogene, Exp. Cell Res., J. 
Cell. Biochem., Eur. J. Biochem., J. Clin. Invest., Matrix Biol., Thromb. 
Res., FEBSLett. 

Evaluator of grant applications for 
Cancerfonden, Sweden, 2007- 
Swedich Medical Research Council, 2000-2007 
The Alzheimer's Association, 1999- 
EU, Cancer, 2003 
The Wellcome Trust, UK, 2003 
Human Frontier Science Program, 1999. 

The Cell Biol. Research Program of the Academy of Finland, 1998. 
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Present major external grants 


Swedich Research Council (Medicin), 1985- 


Swedish Cancer Foundation, 1998- 


PUBLICATIONS 


Original articles 

1. Rubin, K., Johansson, S., Pettersson, L, Ocklind, C, Obrink, B., Hook, 
M. 1979. "Attachment of rat hepatocytes to collagen and fibronectin: a 
study using antibodies against the surface components" Biochem. Biophys. 
Commun. 91, 86-94 

2. Johansson, S., Rubin, K., Hook, M., Ahlgren T., Seljelid, R. 1979. "In 
vitro biosynthesis of cold insoluble globulin (fibronectin)". FEBS Lett. 105, 
313-316 

3. Hedman, K., Kurkinen, M., Alitalo, K., Vaheri, A., Johansson, S., 
Hook, M. 1979. "Isolation of the pericellular matrix of human fibroblast 
cultures". J. Cell Biol. 81, 83-91 

4. Johansson, S., Hook, M. 1980. "Heparin enhances the rate of binding 
of fibronectin to collagen". Biochem. J. 187, 521-524 

5. Rubin, K, Johansson, S., Hook, M., Obrink, B. 1981. "Substrate 
adhesion of rat hepatocytes: On the role of fibronectin in cell spreading". 
Exp. Cell Res. 135, 127-135 

6. Johansson, S., Kjellen, L., Hook, M M Timple, R. 1981. "Substrate 
adhesion of rat hepatocytes: A comparison of laminin and fibronectin as 
attachment proteins". J. Cell Biol. 90, 260-264 

7. Hedman, K., Johansson, S., Vartio, T., Kjellen, L., Vaheri, A., Hook, 
M. 1982. "Structure of the pericellular matrix in human fibroblast cultures: 
Association of heparan and chondroitin sulfates with the fibronectin- 
procollagen fibers". Cell 26, 663-671 

8. Linde, A., Johansson, S., Jonsson, R., Jontell, M., 1982. "Localization 
of fibronectin during dentogenesis in rat incisor". Arch. Oral Biol. 27, 1069- 
1073 

9. Bagge, L., Hedstrand, U., Hook, M., Johansson, S., Lind, E., Modig, J., 
Saldeen, T. 1983. "Fibrinolysis inhibition and fibronectin in the blood in 
patients with delayed microembolism syndrome". Upsala J. Med. Sci. 88, 
81-94 

10. Timpl, R., Johansson, S., van Delden, V., Oberaumer, I., Hook, M. 
1983. "Characterization of protease-resistant fragments of laminin 
mediating attachment and spreading of rat hepatocytes". J. Biol. Chem. 
258, 8922-8927 


11. Johansson, S. 1983. "Pericellular matrix components and cell 
adhesion". Doctoral thesis at the University of Uppsala Sweden. 

12. Johansson, S., Hook, M. 1984. "Substrate adhesion of rat hepatocytes: 
On the mechanism of cell attachment to fibronectin". J. Cell Biol. 98, 810- 
817 

13. Hedman, K., Vartio, T., Johansson, S., Kjellen, L., Hook, M., Linker, 
A., Salonen, E-M., Vaheri, A. 1984. "Integrity of the pericellular fibronectin 
matrix of fibroblasts is independent of sulfated glycosaminoglycans". EMBO 
J. 3, 581-584 

14. Johansson, S., Hedman, K., Kjellen, L., Christner, J., Vaheri, A., Hook 
M. 1985. "Structure and interactions of proteoglycans in the extracellular 
matrix produced by cultured human fibroblasts". Biochem. J. 232, 161-168 

15. Smedsrod, B., Johansson, S., Pertoft, H. 1985. " In vivo and in vitro 
studies on the uptake and degradation of soluble collagen in rat liver 
endothelial and Kupffer cells". Biochem, J. 228, 415-424 

16. Johansson, S. 1985. "Demonstration of high affininty fibronectin- 
receptors on rat hepatocytes in suspension". J. Biol. Chem. 260, 1557-1561 

17. Johansson, S., Smedsrod, B. 1986 "Identification of a 72 kDa plasma 
gelatinase in preparations of fibronectin". J. Biol. Chem. 261, 4363-4366 

18. Woods, A., Couchman, J., Johansson, S., Hook, M. 1986. "Adhesion 
and cytoskeletal organization of fibroblasts in response to fibronectin 
fragments". EMBO J. 5, 65-670 

19. Johansson, S., Forsberg, E., Lundgren, B. 1987. "Comparison of 
fibronectin receptors from rat hepatocytes and fibroblasts". J. Biol. Chem. 
262, 7819-7824 

20. Perris, R., Johansson, S. 1987. "Amphibian neural crest cell migration 
on purified extracellular matrix components: A chondroitin sulfate 
proteoglycan inhibits locomotion on fibronectin substrates". J. Cell Biol. 
105, 2511-2521 

21. Johansson, S., Gustafsson, S., Pertoft, H. 1987. "Identification of a 
fibronectin receptor specific for liver endothelial cells". Exp. Cell Res. 425- 
431 

22. Wiersma, E. J., Froman, G., Johansson, S., Wadstrom, T. 1987. 
"Carbohydrate specific binding of fibronectin to Vibrio Cholerae cells". 
FEMS Lett. 44, 365-369 


23. LeBaron, R., Esko, J., Woods, A., Johansson, S., Hook, M. 1988. 
"Adhesion of glycosaminoglycan-deficient Chinese Hamster Ovary cell 
mutants to fibronectin substrate". J. Cell BioL 106, 945-952 

24. Woods, A., Johansson, S., Hook, M. 1988. "Fibronectin fibril formation 
involves reorganization of external fibronectin by two cell surface 
components". Exp. Cell Res. 17, 272-283 

25. Carri, N. G., Perris, R., Johansson, S., Ebendal, T. 1988. "Differential 
promotion of neurite outgrowth from retinal explants by purifies 
extracellular matrix molecules". J. Neurosci. Res. 19, 428-439 

26. Hedin, U., Bottger, B. A., Forsberg, E., Johansson, S., Thyberg, J. 

1988. "Diverse effects of fibronectin and laminin on phenotypic properties of 
cultured arterial smooth muscle cells. J. Cell BioL 107, 307-319 

27. Hedin, U., Bottger, B. A., Luthman, J., Johansson, S., Thyberg, J. 

1989. "A substrate of the cell-attachment sequence of fibronectin (Arg-Gly- 
Asp) is sufficient to promote the transition of arterial smooth muscle cells 
from a contractile to a synthetic phenotype". Dev. BioL 133, 489-501 

28. Bottger, B. A., Hedin, U., Johansson, S., Thyberg, J. 1989. "Integrin- 
type fibronectin receptors of rat arterial smooth muscle cells: isolation, 
partial characterization and role in cytoskeletal organization and control of 
differentiated properties". Differentiation 41, 158-167 

29. Smedsrod, B., Paulsson, M., Johansson, S. 1989. "Uptake and 
degradation in vivo and in vitro of laminin and nidogen by rat liver cells". 
Biochem. J. 261, 37-42 

30. Perris, R., Johansson, S. 1990. "Inhibition of neural crest cell 
migration by aggregating chondroitin sulfate proteoglycans is mediated by 
their hyaluronan binding region". Dev. BioL 137, 1-12 

31. Hansson, M., Odin, P., Johansson, S., Obrink, B. 1990. "Comparison 
and functional characterization of C-CAM, glycoprotein Ilb/IIIa and integrin 
beta-1 in rat platelets". Thromb. Res. 58, 61-73 

32. Forsberg, E., Paulsson, M., Timpl, R., Johansson, S. 1990. 
"Characterization of a laminin receptor on rat hepatocytes". J. Biol. Chem. 
265, 6376-6381 

33. Stamatoglou, S. C, Sullivan, K. H., Johansson, S., Bayley, P. M., 
Burdett, I. D., Hughes. 1990 "Localization of two fibronectin-binding 
glycoproteins in rat liver and primary hepatocytes". J. Cell Sci. 97, 595-606 


34. Stamatoglou, S. C, Bawumia, S., Johansson, S*, Forsberg, E., Hughes, 
C. R. 1991. "Affinity of integrin ajpi from liver sinusoidal membranes for 
type IV collagen". FEBS Lett. 288, 241-243 

35. Warmegard, B., Martin, N., Johansson S. 1992. "cDNA-cloning and 
sequencing of rat alfa-l-macroglobulin". Biochemistry 31, 2346-2352 

36. Pujades, C, Forsberg, E., Enrich, C, Johansson, S« 1992. "Changes in 
cell surface expression of fibronectin and fibronectin receptor during liver 
regeneration". J. Cell Sci. 102, 815-820 

37. Smilenov, L., Forsberg, E., Zeligman, I., Sparrman, M., Johansson, S. 
1992. "Separation of fibronectin from a plasma gelatinase using 
immobilized metal affinity chromatography". FEBS Lett 302, 227-230 

38. Murtomaki, S., Risteli, J., Risteli, L., Koivisto, U.-M., Johansson, S., 
Liesi, P. 1992, "Laminin and its neurite outgrowth promoting domain 
accumulate in the brain in Alzheimer's disease and Down's syndrome". J. 
Neurosci. Res. 32, 261-273 

39. Malmstrom, P.-U., Larsson, A., Johansson, S. 1993. "Urinary 
fibronectin in diagnosis and follow-up of patients with urinary bladder 
cancer". Brittish J. Urol. 72, 307-310 

40. Forsberg, E., Ek, B., Johansson, S. 1994. "Purification and 
characterization of integrin cxqPi". Exp. Cell Res. 213, 183-190 

41. Forsberg, E., Lindblom, A., Paulsson, M., Johansson, S. 1994. 
"Laminin isoforms promote attachment of hepatocytes via different 
integrins". Exp. Cell Res. 215, 33-39 

42. Fassler, R., Pfaff, M., Murphy, J., Noegel, A. A., Johansson, S. Timpl, 
R., Albrecht, R. 1995. "Lack of pi integrin gene in embryonic stem cells 
affects morphology, adhesion and migration but not integration into the 
inner cell mass". J. Cell Biol 128, 979-988 

43. Prasthofer, T., Ek, B., Ekman, P., Hook, M., Johansson, S. 1995. 
"Protein kinase C phosphorylats two of the four known syndecan 
cytoplasmic domains in vitro". Biochem. and Mol. Biol. Int. 36, 793-802. 

44. Holmvall. K., Camper, L., Johansson, S., Kimura, J. H., Lundgren- 
Akerlund, E. 1995. "Chondrocyte and chondrosarcoma cell integrins with 
affinity for collagen type II and their response to mechanical stress". Exp. 
Cell Res. 221, 496-503 

45. Hauzenberger, D., Martin, N., Johansson, S., Sundqvist, K.-G. 1996. 
"Characterization of lymphocyte fibronectin". Exp. Cell Res. 222, 312-318 


46. Wennerberg, K, Lohikangas, L., Gullberg, D., Pfaff, M., Johansson, S., 
Fassler, R. 1996. "Integrin betal-dependent and -independent 
polymerization of fibronectin". J. Cell Biol. 132, 227-238 

47. Frieser, M., Hallmann, R., Johansson, S., Vestweber, D., Goodman, S. 
L., Sorokin, L. 1996. "Mouse polymorphonuclear granulocyte binding to 
extracellular matrix molecules involves pi integrins". Eur. J. Immunol, 26, 
3127-3136 

48. Yu, J-L., Johansson, S., Ljung, A. 1997. "Fibronectin exposes different 
domaines after adsorption to a heparinized and an unheparinized polyvinyl 
chloride surface". Biomaterials, 18, 421-427. 

49. Svineng, G., Fassler, R., Johansson, S. 1998. "Identification of piC-2, a 
novel splice variant of the integrin subunit pi". Biochem. J. 330, 1255-1263. 

50. Wennerberg, K., Fassler, R., Warmegard, B., Johansson, S. 1998. 
"Mutational analysis of the potential phosphorylation sites in the 
cytoplasmic domain of integrin pi A. Requirement for threonines 788-789 in 
receptor activation". J. Cell Scl 111, 1117-1126 

51. Hirsch, E., Lohikangas, L., Gullberg, D., Johansson, S., Fassler, R. 
1998. "pi is not essential for skeletal myogenesis in vivo but plays a role in 
vitro". J. Cell Set. Ill, 2397-2409. 

52. Brakebusch, C., Wennerberg, K, Krell, H. W., Weidle, U. H., Sallmyr, 
A., Johansson, S., Fassler, R. 1999. " pi integrin promotes but is not 
essential for metastsis of ras-myc transformed fibroblasts". Oncogene, 18, 
3852-3861. 

53. Su, B., Johansson, S., Fallman, M., Patarroyo, M., Granstrom, M., 
Normark, S. 1999. "Signal transduction-mediated adherence and entry of 
Helicobacter pylori into cultured cells". Gastroenterology, 117, 595-604. 

54. McCourt, P. A. G., Smedsrod, B., Melkko, J., Johansson, S. 1999. 
"Characterization of the hyaluronan receptor on liver endothelial cells and 
its functional relationship to the scavanger receptor." Hepatology, 30, 1276- 
1286. 

55. Armulik, A., Nilsson, L, von Heijne, G., Johansson, S. 1999. 
"Delineation of the border between the transmembrane and cytoplasmic 
domains of human integrin subunits". J. Biol Chem., 274, 37030-37034. 

56. Svineng, G., Johansson, S. 1999. "Integrin subunits piC-1 and plC-2 
expressed in GD25 cells are retained and degraded intracellularly rather 
than localized to the cell surface". J. Cell ScL, 112, 4751-4761. 


57. Armulik, A., Svineng, G., Wennerberg, K, Fassler, R., Johansson, S. 
2000. "Expression of integrin subunit piB in pi-deficient GD25 cells does not 
interfere with avpi functions". Exp. Cell Res,, 254, 55-63. 

58. Wennerberg, K., Armulik, A., Sakai, T., Karlsson, M., Fassler, R., 
Schaefer, E. M., Mosher, D., Johansson, S. 2000. "The cytoplasmic tyrosines 
of integrin subunit pi are involved in FAK activation". Mol. Cell BioL 20, 
5758-5765. 

59. Fowler, T, Wann, E.R., Joh, D, Johansson, S., Foster, T.J., Hook, M. 
2000. "Staphylococcus aureus invasion of mammalian cells involves a 
fibronectin bridge between the fibronectin-binding MSCRAMMs and host pi 
integrins". Eur. J. Cell BiolJ9, 672-679. 

60. Lohikangas, L., Gullberg, D., Johansson, S. 2001. "Assembly of laminin 
polymers is dependent on pi integrins". Exp. Cell Res. 265, 135-144. 

61. Politz, O., Grachev, A., McCourt, P., Schledzewski, K., Guillot, P., 
Longati, P., Johansson, S., Birk, P., Hakiy, N., Franke, P., Kodelja, V., 
Kannicht, C.,, Orfanos, C, Johansson, S., Goerdt, S. 2002. "Stabilin-1 and 
stabiline-2 constitute a novel family of fasciclin-like hyaluronan receptor 
homologues". Biochem. J. 362,155-164. 

62. Gustavsson, A., Armulik. A., Brakebusch, C, Johansson, S., Fallman, 
M. 2002. "Role of the pi integrin cytoplasmic tail in mediating invasin- 
promoted internalization of bacteria". J. Cell Sci. 115, 2669-2678. 

63. Veiling, T., Risteli, J., Wennerberg, K, Mosher, D.F., Johansson S. 
2002. "Polymerization of Type I and III Collagens Is Dependent On 
Fibronectin and Enhanced By Integrins all pi and a2pi." J. BioL Chem., 277, 
37377-37381. 

64. Fowler, T., Johansson, S., Wary, K. K, Hook M. 2003. "Src kinase 
inhibitors control in vitro cellular internalization of Staphylococcus aureus". 
Cell. Microbiol. 5, 417-426. 

65. Armulik, A., Veiling, T., Johansson, S. 2004. " The integrin Dl subunit 
transmembrane domain regulates PI3K-dependent tyrosine phosphorylation 
of CAS". Mol. BioL Cell, 15, 2558-2567. 

66. Stefansson, A,. Armulik, A., Nilsson, I-M. von Heijne, G., Johansson, S. 
2004. "Determination of N- and C-terminal borders of the transmembrane 
domain of integrin subunits". J. BioL Chem. 279, 21200-21205. 

67. Veiling, T., Nilsson, S., Stefansson, A., Johansson S. 2004. "pl-Integrins 
induce phosphorylation of PKB/Akt on serine 473 independently of focal 
adhesion kinase and Src family kinases". EMBO Reports, 5, 901-905. 


68. Hansen B, Longati P, Elvevold K, Nedredal GI, Schledzewski K, Olsen R, 
Falkowski M, Kzhyshkowska J, Carlsson F, Johansson S, Smedsrod B, 
Goerdt S, McCourt P, Johansson S. (2005) "Stabilin-1 and stabilin-2 are both 
directed into the early endocytic pathway in hepatic sinusoidal endothelium 
via interactions with clahtrin/AP-2 independent of ligand binding". Exp Cell 
Res., 303, 160-73. 

69. Lannergard J, Flock M, Johansson S, Flock J-I , Guss B. 2005 " Studies 
of fibronectin-binding proteins of Streptococcus equ" Infect. Immun. 75, 
7243-7251 

70. Nilsson, S., Caniovska, D., Brakebush, C., Fassler, R., Johansson S. 
2006. "Threonine-788 in integrin subunit pi regulates integrin activation." 
Exp Cell Res., 312, 844-853. 

71. Veiling, T, Stefansson, A.,. Johansson S. 2007. "EGFR and pl-Integrins 
utilise different signalling pathways to activate Akt" Exp. Cell Res., in press. 


Attorney Docket No. 10142.0001 
Application No.: 09/980,403 


APPENDIX 2 


OF 

DECLARATION UNDER 37 C.F.R. § 1.132 
DATED JANUARY 16, 2008 
BY 

STAFFAN JOHANSSON 


Content: 

Hynes, R.O., Integrins: versatility, modulation, and signaling in cell adhesion. Cell. 1992 Apr 

3;69(1): 11-25. (15 pages) 

Hynes, R.O., Cell adhesion: old and new questions. Trends Cell Biol. 1999 Dec;9(12):M33-7. 

(5 pages) 

Loftus, J.C., et al., Integrin-mediated cell adhesion: the extracellular face. J Biol Chem. 1994 

Oct14;269(41):25235-8. (4 pages) 

Gale, C, et al., Cloning and expression of a gene encoding an integrin-like protein in Candida 
albicans. Proc Natl Acad Sci USA. 1996 Jan 9;93(1):357-61. (5 pages) 

Berg, R.W., et al., Cloning and characterization of a novel beta integrin-related cDNA coding for 
the protein TIED ("ten beta integrin EGF-like repeat domains") that maps to chromosome band 
13q33: A divergent stand-alone integrin stalk structure. Genomics. 1999 Mar 1;56(2): 169-78. 

(10 pages) 

Laval, V., et al., A family of Arabidopsis plasma membrane receptors presenting animal beta- 
integrin domains. Biochim Biophys Acta. 1999 Nov 16;1435(1-2):61-70. (10 pages) 

Rubin, G.M., et al., Comparative genomics of the eukaryotes. Science. 2000 Mar 

24;287(5461 ):2204-1 5. (12 pages) 

Forsberg, E., et al., Purification and characterization of integrin alpha 9 beta 1. Exp Cell Res. 

1994 Jul;213(1):183-90. (8 pages) 


Munger, J.S., et al., The integrin alpha v beta 6 binds and activates latent TGF beta 1: a 
mechanism for regulating pulmonary inflammation and fibrosis. Cell. 1999 Feb 5;96(3):319-28. 

(10 pages) 


Cell, Vol. 69, 11-25, April 3, 1992, Copyright © 1992 by Cell Press 


Integrins: Versatility, Review 
Modulation, and Signaling 
in Cell Adhesion 


Richard O. Hynes 

Howard Hughes Medical Institute and 
Center for Cancer Research 
Department of Biology 
Massachusetts Institute of Technology 
Cambridge, Massachusetts 02139 


The recognition of integrins as a widely expressed family 
of cell surface adhesion receptors is around five years old 
(Hynes, 1987), At that time, one could identify about ten 
distinct vertebrate integrins; there are now about twenty, 
and the number is still rising. Integrins appear to be the 
major receptors by which cells attach to extracellular matri- 
ces, and some integrins also mediate important cell-cell 
adhesion events. Through these functions they play im- 
portant roles both in development and in adult organisms. 
Several human genetic diseases affecting integrins dem- 
onstrate their importance in various physiological and 
pathological processes, and the ability to interfere with 
integrin functions using antibodies or peptides offers many 
opportunities for therapeutic intervention in diseases as 
diverse as thrombosis, inflammation, and cancer. Be- 
cause of these multifarious roles, integrins have been in- 
tensively studied by scientists in many different fields, and 
more than one integrin paper a day is now published. De- 
spite this plethora of information, it is possible to discern 
common principles, and, in this brief review, I will attempt 
some generalizations and syntheses to make the field ac- 
cessible to the nonspecialist. In particular, I will focus on 
recent evidence concerning regulation of integrin affinities 
and signaling events mediated by integrins. 

Multiple Integrins and Multiple Ligands 

All integrins are aP heterodimers. The a subunits vary in 
size between 1 20 and 1 80 kd and are each noncovalently 
associated with a p subunit (90-110 kd). Most integrins 
are expressed on a wide variety of cells, and most cells 
express several integrins. The table summarizes the diver- 
sity of vertebrate integrins as currently understood. There 
are 8 known p subunits and 14 known a subunits, all of 
which have been sequenced at the cDNA level except a 7 
and cxiel- References for most of the subunits are given in 
earlier reviews (Albelda and Buck, 1990; Arnaout, 1990; 
Hemler, 1990; Springer, 1990a, 1990b; Ruoslahti, 1991). 

Although the nomenclature given in Table 1 is the most 
widely used, some earlier names still persist in the litera- 
ture. The platelet-specific integrin, an b p3, is often referred 
to as GPIIb-llla (Kieffer and Phillips, 1990; Phillips et aL, 
1991), and other integrins expressed on platelets are 
sometimes given names based on earlier platelet glyco- 
protein nomenclature (GPIalla = a 2 Pi,GPIclla = a 5 p,plus 
a 6 Pi). The leukocyte-specific p 2 integrins are still referred 
to by earlier names (a L P2 = LFA-1 ; a M Pz » Mac-1 , Mo-1 , 
or CR3; axPz = p1 50,95); these are the only integrins for 
which CD nomenclature is frequently used (fo = CD18; 


associated a subunits « CD1la,b,c). Some other integrin 
subunits also have assigned CD numbers (Hemler, 1990; 
Springer, 1 990a, 1 990b), but these are rarely used. Finally, 
several p t integrins are sometimes referred to as VLA (very 
late after activation) antigens, a name that arose from the 
time at which ohP, and a 2 Pi appear on lymphocytes 
(Hemler, 1990). However, since most cells in the body 
express one or more Pi integrins constitutive ly, and since 
some of the a subunits occur in association with other p 
subunits, the aP nomenclature is more widely applicable. 

Although 8 P subunits and 14 a subunits could in theory 
associate to give more than 1 00 integrin heterodimers, the 
actual diversity appears to be much more restricted. Many 
a subunits can associate with only a single p subunit. For 
instance, white blood cells express both Pi and p 2 inte- 
grins, but each a subunit associates with only one of the 
two P subunits. Thus, subfamilies with shared p subunits 
can be defined (see table). However, several a subunits 
(ou, a 6 , a v , and perhaps others) can associate with more 
than one p subunit; cu is particularly promiscuous in this 
respect. The discovery of four new p subunits (Rama- 
swamy and Hemler, 1990; Suzuki et al., 1990; McLean et 
al., 1990; Sheppard et al., 1990; Yuan et al., 1990, 1991; 
Erie et al . , 1 991 ; Moyle et aL , 1 99 1 ) and two new a subunits 
(Kramer et al., 1991; von der Mark et aL, 1991; Bossy et 
al. , 1 991 ) in the past two years suggests that others may be 
found as new techniques (e.g., polymerase chain reaction) 
are increasingly applied, and newly discovered subunits 
may turn out to have additional associations not yet recog- 
nized. 

A further level of complexity is introduced by the exis- 
tence of alternative splicing. In mammals, several subunits 
have alternatively spliced cytoplasmic domains. These in- 
clude Pi (Altruda et al., 1990), p 3 (van Kuppevelt et aL, 
1989), p 4 (Suzuki and Naitoh, 1990; Hogervorst et aL, 
1990; Tamuraet aL, 1990), a 3 (Tamura etaL, 1991;C. M. 
DiPersio and R. Hynes, unpublished data), and a 6 (Hoger- 
vorst et al., 1991; Cooper etaL, 1991). Human an b can be 
alternatively spliced in the extracellular domain close to 
the membrane (Bray et al., 1990), the Drosophila PS2 a 
subunit can be alternatively spliced in a region adjacent to 
the metal-binding sites (Brown et al., 1989), and the PS3 
p subunit can be alternatively spliced in the ligand-binding 
domain (G. Yee, R. Patel-King, F. Chen, and R. Hynes, 
unpublished data). Possible implications of the alternative 
splicing will be discussed later. 

The functions (ligand and adhesive specificity) of individ- 
ual integrins have been elucidated using cell adhesion 
assays, monoclonal antibodies, and affinity chromatogra- 
phy. Examination of Table 1 quickly reveals that individ- 
ual integrins can often bind to more than one ligand. 
Equally, individual ligands are, more often than not, recog- 
nized by more than one integrin. Thus, earlier designations 
of integrins as fibronectin or vitronectin receptors have 
proven too restrictive— there are several receptors for 
each of these proteins, and many of them are not highly 
specific for individual adhesive ligands. The majority of the 
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Table 1 . The Integrin Receptor Family 


Subunits 


Ligands and Countetreceptors 

Binding Site 

Pi" 

Qi 

Collagens, laminin 



aj 

Collage ns, laminin 

DGEA h 


a 3 * 

Fibronectin, laminin, coliagens c 

RGD ± ?' 


a* 

Fibroneclin (V25), VCAM-1 

EILDV* 


a* 

Fibronectin (RGD) 

RGD 



Laminin 



a 7 

Laminin 



CLi 

? 



Qv 


nuu 


a L 

ICAM-1, ICAM-2 



Om 

voo curnpunoni ui uumi pmmcn i iinociivaiBGj, iiDnnoyen, laciur a, iumivi-i 



ax 

Fibrinogen, C3b component of complement (inactivated)? 

GPRP 


a ( i D 

Fibrinogen, fibronectin, von Willebrand factor, vitronectin, thrombospondin 

RGD. KQAGDV 


a v 

Vitronectin, fibrinogen, von Willebrand factor, thrombospondin, fibronectin, osieoponttn, collagen 

RGD 

IV 

a B ' 

Laminin ??* 


P. 

a v 

Vitronectin 

RGD 


a v 

Fibronectin' 

RGD 

P7(=PP?) 

a« 

Fibronectin (V25), VCAM-1 0 

EILDV k 



? 


Pa 

a v 

? 



The current spectrum and interactions of vertebrate integrins are listed. cDNA sequences for all subunits except a 7 and a^ L have been reported 
(see data banks or earlier reviews for references). Several subfamilies exist, each with 2-9 a subunits and a common shared p subunit (3,, p 2l Pj. 
or fJ 7 ). In addition, several of the a subunits can interact with other p subunits (P<-pa). Each ap receptor recognizes one or more extracellular ligands 
or counterreceptors on other cells. It should be noted that the ligand specificity of a given receptor can be markedly affected by its environment 
or state of activation (see text). The peptide recognition sequences in the ligands are given where known; where none is given, the evidence indicates 
that the recognition sequence is not RGD. 

* These subunits can have alternatively spliced cytoplasmic domains. 

b The subunit designated a, EL is expressed specifically on intraepithelial lymphocytes (IEL) in association with p 7 (Yuan et al., 1991; Parker et al., 
1992). However, a E has previously been used for a« in association with p 4 (Kajiji et at., 1989), so this term has not been used here to avoid confusion. 
c A recent paper (Carter et al., 1991) describes a ligand designated epiligrin that is not fully characterized. 
" Publications differ as to the specificity of this receptor (Bodary and McClean, 1990; Vogel et al., 1990). 

* The specificity of this receptor is controversial (Lotz et al., 1990; Sonnenberg et at.. 1990). 
' Busk et al. (1992). 

0 The specificity of this receptor is not yet clear. a<p 7 on some cells can apparently bind the same ligands as a«p, (Ruegg et al., 1992), whereas 
on others it requires activation and is much less effective than a«p, (Chan et al., 1992a). 

h Defined in type I collagen only (Staatz et al., 1991). 

1 Publications differ as to whether or not this receptor recognizes RGD (Wayner et al, 1988; Hynes et al., 1989; Elices et al.. 1991). 
" Defined only in the alternatively spliced V segment of fibronectin. 

1 RGD is recognized in all ligands; KQAGDV is recognized only in fibrinogen y chain. 


ligands listed in the table are extracellular matrix proteins 
involved in cell-substratum adhesion. However, some 
of these, such as fibrinogen, can also mediate cell-cell 
aggregation, and some integrins recognize integral 
membrane proteins of the immunoglobulin superfamily 
(ICAM-1, ICAM-2, VCAM-1) and mediate direct cell-cell 
adhesion. 

Considerable progress has been made in defining the 
integrin recognition sites in the ligands and counterrecep- 
tors listed in the table. The first binding site to be defined 
was the Arg-Gly-Asp (RGD) sequence present in fibronec- 
tin, vitronectin, and a variety of other adhesive proteins. 
This tripeptide sequence is recognized by several into* 
grins (a$ u a,i b p 3 , and all or most a v (J integrins) but not by 
most others. au b p3 recognizes, in addition, the sequence 
Lys-Gln-Ala-Gly-Asp-Val (KQAGDV) in fibrinogen. Other 
integrins recognize different sequences: a 2 p, binds Asp- 
Gly-Glu-Ala (DGEA) in type I collagen, oup-, binds Glu-lle- 


Leu-Asp-Val (EILDV) in an alternatively spliced segment 
of fibronectin, and it has recently been reported that a x p2 
binds Gly-Pro-Arg-Pro (GPRP) in fibrinogen (Loike et al., 
1991). Other binding sites have not yet been defined as 
precisely, although the various laminin receptors recog- 
nize specific parts of the laminin molecule (e.g., Hal! et al., 

1990) , and those integrins binding immunoglobulin super- 
family counterreceptors recognize specific immunoglobu- 
lin-like domains (Staunton et al., 1990; Diamond et al., 

1991) . 

Figure 1 summarizes some of the relationships and in- 
teractions of individual integrin subunits. It is based on a 
dendrogram relating the sequences of various a subunits. 
The sequence homologies allow one to cluster the a sub- 
units in several subgroups, and Figure 1 also depicts other 
relationships that reinforce this conclusion. Thus, all the 
RGD-reactive integrins are relatively closely related, and 
all contain cleaved a subunits without I domains (see be- 
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Figure 1. Structural, Functional, and Evolutionary Relationships 
among Integrins 

The figure depicts schematically the sequence similarities among dif- 
ferent human a subunits. Also indicated are various structural and 
functional features, such as whether or not a subunits are posttransla* 
tionalty cleaved, the presence or absence of I domains (homologous 
segments of 180 amino acids inserted in the extracellular domains of 
some a subunits), and interactions of individual a subunits with various 
3 subunits. All a subunits except a )a> and the three (^-associated sub- 
units can bind to p,; only a, can bind to more than two p subunits. 
Shaded box indicates the subset of related integrins that recognize 
RGO sequences, and the stippled boxes indicate two distantly related 
groups of laminin-binding integrins. (a? Is not yet fully sequenced, but 
its N-terminal sequence is most closely related to that of a«.) The two 
sets of laminin-binding integrins appear to recognize different parts of 
the laminin molecule, suggesting that taminin recognition may have 
evolved twice. The gene structures of <*■* (Heidenreich et al., 1990) 
and a, (Corbi et al., 1990) are related to each other and to that of 
Drosophila PS2 a (Brown et at., 1989), confirming the ancient evolu- 
tionary origin of a subunits. 


low). There are two separate clusters of laminin receptors. 
The members of the first group (a$ u a 6 Pi, and a 7 Pi) all 
recognize the long arm of laminin, and all contain cleaved 
a subunits. The second group (mPi and a 2 Pt) is distantly 
related to the first group, although ai and <x 2 are very similar 
to each other (50% identity, and both are uncleaved a 
subunits with I domains). aiPi recognizes the cross region 
of laminin; it is a reasonable prediction that ct 2 pi binds to 
the same region. It appears that laminin recognition has 
evolved twice in different integrin subsets. Similarly, rec- 
ognition of two different sequences in fibronectin appar- 
ently has evolved twice (a 5 p* recognizes RGD, and cup, 
EILDV), as has recognition of different parts of fibrinogen 
by p 2 and p 3 integrins. 

Since the sequences and genes of Drosophila integrins 
are about as closely related as those of the most divergent 
vertebrate subunits, it is clear that integrins arose at a very 
early point in evolution, before divergence of the prote- 
st ome and deuterostome lineages. Indeed, one can argue 
that metazoans would require integrins or something anal- 
ogous to maintain multicellularity. It is clear that subse- 
quent divergence of the integrin subunits has allowed de- 
velopment of great versatility in the cell adhesion mediated 
by these receptors - 


Transmembrane Topography and Interactions 

Both subunits of integrins are transmembrane glycopro- 
teins, each with a single hydrophobic transmembrane seg- 
ment (Figure 2). In most integrins, the cytoplasmic do- 
mains are short (50 amino acids or less). p 4 is a notable 
exception: its cytoplasmic domain comprises over 1000 
amino acids. The extracellular domains (>75 kd for p sub- 
units, and >100 kd for a subunits) associate to form the ap 
heterodimers. Truncated forms lacking transmembrane 
and cytoplasmic domains can be expressed and do form 
functional aP dimers (Dana et al., 1991; Bodary et al., 
1991), indicating that ap interactions do not rely on the 
transmembrane or cytoplasmic domains; this conclusion 
is also supported by the selective a associations of chi- 
meric p subunits (Solowska et al., 1991). Electron micro- 
scopic images of several integrins show a globular head 
apparently comprising parts of both subunits and two 
stalks extending to the lipid bilayer (Carrell et al., 1985; 
Kelly et al., 1987; Nermut et al., 1988). Both subunits con- 
tain extensive disulfide bonding, the patterns of which 
have been partially elucidated (Calvete etal., 1989, 1991). 
Consistent with a model of compact folded domains, inte- 
grins are fairly resistant to proteolysis of intact cells. 

These and other observations give rise to the models 
depicted in Figure 2. Characteristic of all p subunits is a 
four-fold repeat of a cystine-rich segment believed to be 
internally disulfide bonded (Calvete et al., 1991). The 
N-terminal 40-50 kd is tightly folded with internal disulfide 
loops and contributes to the ligand-binding domain (see 
below). The a subunits all contain a seven-fold repeat of 
a homologous segment; the last three or four of these 
repeats contain sequences (Asp-x-Asp-x-Asp-Gly-x-x-Asp 
or related sequences) that likely contribute the divalent 
cation-binding properties of these subunits. Divalent cat- 
ions are essential for receptor function. The nature of the 
cations can affect both affinity and specificity for ligands 
and divalent cations are necessary for aP subunit associa- 
tions of some integrins (Gailit and Ruoslahti, 1988; Kirch- 
hofer et al., 1990a, 1991). This part of the a subunit also 
contributes to the ligand-binding domain (see below). 

Some a subunits are posttranslationally cleaved to give 
a 25-30 kd transmembrane chain disulfide-bonded to a 
larger, wholly extracellular chain (Figure 2). Other a sub- 
units contain an extra segment of around 180 amino acids, 
known as an I domain, which is inserted before the last 
five homologous repeats containing the cation-binding do- 
mains. The functions of the I domains are unknown, but 
they are homologous to collagen-binding domains of von 
Willebrand factor and to cartilage matrix protein and 
complement proteins. I domains are characteristic of 
[^-associated a subunits and of the a^ and a? subunits, 
which contribute to the coltagen-laminin receptors aiPi 
and a 2 p 2 (Figure 1). The best current guess is that the I 
domains contribute ligand-binding functions to these inte- 
grins, but that remains to be proven. Interestingly, one of 
the Drosophila integrin a subunits contains an alterna- 
tively spliced segment that can be inserted just N-terminal 
to the cation-binding domains (Brown et al., 1989). This is 
close to the position of the I domains and may also affect 
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Figure 2. Structural Features of Integrin Re- 
ceptors 

(a) shows the overall shape, as deduced from 
electron microscopy, as well as the putative 
locations ol the cystine-rich repeats of the (J 
subunit (crossmatched) and metal-binding sites 
in the a subunit (M**). The shaded area repre- 
sents the ligand-binding region that is known 
to be made up from both subunits based on 
cross-finking and binding data. 

(b) schematizes the arrangement of the poly- 
peptide chains with the cystine repeats inter- 
nally folded and the head region of the p sub- 
unit containing internal disulfide loops, some 
but not all of which are shown. A disulfide bond 
from the middle of the p subunit to a point close 
to the membrane has been proposed (Calvete 
et al., 1991) but is omitted here for clarity. Xs 
indicate positions of mutations (of human p 2 or 
p3 subunits) known to affect ligand binding or 
aP dimerization. The positions of alternatively 
spliced segments in Drosophila subunits are 
shaded. 


ligand-binding specificity or (* subunit associations (see 
also Figure 2 and below).There is also alternative splicing 
in the ligand-binding domain of a Drosophila 3 subunit (G. 
Yee, R. S. Patel-King, F. Chen, and R. O. Hynes, unpub- 
lished data). So far, no such splicing of ligand-binding do- 
mains has been detected in vertebrates. 

The position of the ligand-binding site(s) within the inte- 
grin subunits can be deduced from several lines of evi- 
dence. Chemical cross-linking data on the two p 3 integrins, 
using peptides bound by these integrins, places the ligand 
in proximity both to the divalent cation-binding domains 
of the a subunits and to the segment from approximately 
residues 100-200 of the 0 3 subunit (O'Souza et al., 1988, 
1990; Smith and Cheresh, 1988, 1990). A point mutation 
at position 119 of the p 3 subunit ablates ligand binding 
(Loftus et al., 1990), and several mutations in the corre- 
sponding region of the f* 2 subunit affect afc associations 
(Kishimoto et al., 1989; Wardlaw et al., 1990; Arnaout et 
al., 1990). These and other data are consistent with the 
models shown in Figure 2, in which both a and (3 subunits 
contribute to the ligand-binding site that lies at or near the 
interface between the two subunits. This is consistent with 
the observation (see table) that switching either the a or 
the (5 subunits in integrins can lead to changes in ligand 
specificity. As mentioned earlier, it is of interest that the I 
domains, when present, and the alternative splicing events 
detected in Drosophila integrin subunits all fall in the same 
regions of the receptors. 

Therefore, the current picture is that the N-terminal do- 
mains of a and p subunits combine to form a ligand-binding 
head on each integrin. This head is connected by two 
stalks, each made up of one of the two subunits, to the 
membrane-spanning segments and thus to the two cyto- 
plasmic domains. These cytoplasmic domains are be- 
lieved to interact with cytoskeletal proteins and perhaps 
with other cytoplasmic components. The evidence for cy- 


toskeletal connections comes from a variety of sources 
and includes light and electron microscopic evidence for 
colocalization of integrins and cytoskeletal structures (see 
Burridge et al., 1988, for review), fluorescence photo- 
bleaching evidence for restricted mobility of integrins in 
focal contacts, which are points of cell-substratum and 
cytoskeleton- membrane contact (Duband et al., 1986), 
and biochemical evidence for interactions of integrins or 
cytoplasmic domain peptides with the cytoskeletal pro- 
teins, talin (Horwitz et al., 1986; Tapley et al., 1989) and 
a-actinin (Otey et al., 1990). Deletion of all or part of the 
Pi cytoplasmic domain interferes with associations with 
focal contacts (Solowska et al. , 1 989; Hayashi et al. , 1 990; 
Marcantonio et al., 1990). 

There is less direct evidence for interactions of a subunit 
cytoplasmic domains with the cytoskeleton, but it is of 
interest that different a subunits have very different cyto- 
plasmic sequences, and that different receptors for a given 
ligand can differ in their apparent associations with the 
cytoskeleton. For example, a s (Ji is in focal contacts, 
whereas a 3 pi is not, even though both interact with fibro- 
nectin (Elices et al., 1991). Similarly, a*p 3 is in focal con- 
tacts, whereas a v p 5 is not, even when both interact with 
the RGD site in vitronectin (Wayner et al., 1991). Such 
results indicate the importance of different integrin sub- 
units in mediating differing cellular responses to common 
extracellular ligands. A recent paper analyzing chimeric 
integrin a subunits shows that different cytoplasmic do- 
mains trigger different functions (collagen gel contraction 
or migration) when transfected into cells (Chan et al., 
1992b). The existence of alternative cytoplasmic domains 
on several integrin subunits presumably contributes fur- 
ther versatility although there is as yet no evidence con- 
cerning their functions. 

Although most integrins are thought to interact some- 
how with the actin-based cytoskeleton, a 6 p 4 clearly plays 
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a different role. This integrin with its large 0 4 cytoplasmic 
domain is specifically concentrated at hemidesmosomes 
in epithelial cells (Stepp et al., 1990; Son n en berg et al., 
1991 ; Kurpakus et al., 1991), where it most likely interacts 
somehow with intermediate filaments, which are charac- 
teristically associated with hemidesmosomes. 

The details and subtleties of integrin-cytoskeleton asso- 
ciations need much further study, but it seems clear that 
one major function of integrins is to mediate cytoskeletal 
interactions at the inner face of the membrane at sites of 
cell-substratum or cell-cell adhesion. I will return later 
to the possibility that integrins may also transmit other 
transmembrane signals and to the general question of how 
extracellular ligand occupancy is coupled to intracellular 
events. 

Modulation of Integrin Affinities and Specificities 

Given the wide variety of integrins, individual cells can and 
do vary their adhesive properties by selective expression 
of integrins. Further versatility is introduced by the ability 
of cells to modulate the binding properties of integrins. The 
ligand specificities of different integrins shown in the table 
are those that can be demonstrated for each receptor. 
However, in a given cell, a particular integrin may not ex- 
hibit all the specificities listed. For instance, a z Pi on plate- 
lets is specific for collagen and not laminin (Staatz et al., 
1 989), whereas on other cells it can recognize both ligands 
(Elices and Hemler, 1 989; Kirchhof er et al., 1 990b). While 
the possibility is not completely ruled out that this particular 
difference could reflect variant forms of a 2 p^ (e.g., alterna- 
tively spliced forms), there are other instances in which a 
purified integrin displays different ligand specificities de- 
pending on context (lipids [Conforti et al., 1 990] or divalent 
cations [Kirchhof er et al., 1990a, 1991]). 

Perhaps more significantly, the specificity and affinity of 
a given integrin receptor on a given cell are not always 
constant. There are numerous examples of modulation 
of integrin function. Both activation and deactivation of 
integrin functions have been reported. The best under- 
stood examples are ai !t) p 3 on platelets (Kteffer and Phillips, 
1990; Phillips et al., 1991) and p 2 integrins on neutrophils, 
monocytes, and lymphocytes. These will serve as a basis 
to discuss the general principles that very likely apply to 
many other instances of integrin modulation. 

Integrin a„t,p 3 on resting circulating platelets does not 
bind any of its soluble ligands, which is a good thing, since 
such binding would lead to thrombosis. Unactivated plate- 
lets bind to surface-bound fibrinogen via an b p 3 and can 
thus join hemostatic events already underway. However, 
only after platelet activation by thrombin, collagen, or other 
platelet agonists does a lto fl 3 become an effective receptor 
for soluble fibrinogen or the other ligands listed in the table 
(Kiefferand Phillips, 1990; Phillips et al., 1991). This acti- 
vation event has been known for a long time and is still not 
fully understood. Activation is accompanied by a confor- 
mational change in the a,i 5 p 3 receptor that can be detected 
immunologically (Shattil et al., 1985; Gulino et al., 1990; 
Kouns et al. t 1990; OToole et al., 1991a; Andrieux et al., 
1991) or biophysically (Parise et al., 1987; Sims et al., 


1991). A further conformational change occurs on ligand 
binding (Frelinger et al., 1988; 1990; 1991). 

Activation of a ti $3 can be accomplished by activation of 
the platelets, an event that involves activation of several 
G proteins, increases in intracellular pH and Ca + \ phos- 
phatidyl inositol turnover, and activation of protein kinases 
(Manning and Brass, 1991 ; Shattil and Brugge, 1991). Re- 
ceptor activation can also be produced by certain antibod- 
ies against the receptor, either in intact cells or with solubi- 
lized receptor (OToole et al., 1991a; Gulino et al., 1990; 
Kouns et al., 1990; Frelinger et al., 1991). Interestingly, 
activation can also be accomplished by the ligands them- 
selves (Du et al., 1991). 

All these results are consistent with a conformational 
switch (or switches) between states of the extracellular 
domain of aii b p3, normally driven from within the cell after 
its activation. Such a model would also predict that the 
activated state would be favored by interactions with mole- 
cules that bind to the activated state of the extracellular 
domain (e.g., antibodies, ligands). How could such a con- 
formational change be driven from within the cell? Some 
clues come from recent results using recombinant DNA 
expression methods. ctnbp 3 expressed in heterologous 
cells (OToole et al., 1991a; Kieffer et al., 1991) is in the 
resting or unactivated state. The activated state can be 
induced by monoclonal antibodies; even solubilized re- 
ceptor can be activated by these monoclonal antibodies. 
Thus, the two states are intrinsic to the receptor itself. 

OToole et al. (1991b) have shown that deletion of the 
cytoplasmic domain of ctu b leads to a receptor that is consti- 
tutively active. Interestingly, substitution of the a 5 cyto- 
plasmic domain did not restore the unactivated state of the 
receptor. Thus, the cytoplasmic domain of a» b in some way 
controls the binding affinity of the extracellular domain 
of aiibPs, maintaining the unactivated state. In activated 
platelets, this control is lifted. How this is achieved remains 
unclear, but the various second messenger pathways trig- 
gered in activated platelets suggest phosphorylation (of 
the integrin or of associated proteins) or lipid mediators 
as candidates. While phosphorylation of a M bP 3 has been 
reported, it is of low stoichiometry and uncertain signifi- 
cance (Hillery et al., 1991; Shattil and Brugge, 1991). 

The p 2 integrins (reviewed in Arnaout, 1990; Larson and 
Springer, 1990) expressed on leukocytes exhibit activa- 
tion phenomena that are strikingly similar to those shown 
by aii & p 3 . Activation of leukocytes is required for expression 
of the various ligand-binding activities of the p 2 integrins 
(see table). Activation is accompanied by a conformational 
change(s) in the p 2 integrins that can be detected by spe- 
cific monoclonal antibodies, including one, 7E3, originally 
isolated by its ability to recognize activated ai lt> p 3 (Altieri 
and Edgington, 1988; Coller, 1985). Furthermore, at least 
one of these monoclonal antibodies (NKI-L16) "activates" 
the functions of a L p 2 (Keizer et al., 1988; van Kooyk et 
al., 1991). This monoclonal appears to be recognizing an 
activated state of the receptor and stabilizing it. Just as 
discussed earlier for am,3 3 . all the data are consistent with 
switches between unactivated and activated conforma- 
tions of the p 2 receptors. 

The effective activation stimuli for p 2 integrins vary de- 
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pending on cell type. The earliest observations were on 
monocytes, in which a M p 2 (complement receptor CR3) can 
be activated by phorbol esters (Wright and Silverstein, 
1982) or by adherence to fibronectin (Wright et al., 1983, 
1984). These results presaged many more recent papers 
demonstrating two key features of integrin function: regu- 
lation from within the cell (inside-out signaling) and modu- 
lation of cellular behavior by extracellular matrix (outside- 
in signaling). Greatest progress in understanding the p 2 
integrins has been made in neutrophils and lymphocytes, 
and I will concentrate on recent results on these two sys- 
tems, which shed some light on the activation phenomena. 

Neutrophils and monocytes need to attach to endothelial 
layers in order to leave the bloodstream at sites of inflam- 
mation. These extravasation events involve several differ- 
ent adhesive proteins (reviewed in Carlos and Harlan, 
1990; Osborn, 1990). Central among these are the p 2 inte- 
grins— in the genetic disease leukocyte adhesion defi- 
ciency (LAD), p 2 integrins are absent, and leukocytes can- 
not adhere stably to the endothelium. In the normal 
process of arrest and extravasation, the first event is that 
the leukocytes roll along the vessel wall. This is mediated 
by adhesion receptors of the select in family (Bevitacqua 
et al., 1991; Lasky and Rosen, 1992; Smith et al., 1991; 
von Andrian et al., 1991; Lawrence and Springer, 1991). 

Selectin-mediated rolling is necessary but not sufficient 
for p 2 integrin-mediated adhesion. The latter requires acti- 
vation of the integrins, which on circulating leukocytes are 
in their inactive state. Activation can be accomplished by 
phorbol esters or, more physiologically, by various inflam- 
matory mediators (e.g., tumor necrosis factor, C5a, plate- 
let activating factor, or fMet-Leu-Phe). Hermanowski- 
Vosatka et al. (1992) have recently described a lipid 
mediator purified from neutrophils stimulated by such ago- 
nists, which activates a M p 2 or a L p 2 either in cells or as 
purified receptors. Exactly how this lipid modulates inte- 
grin affinity is unclear, but its effects are reminiscent of 
earlier results showing that the specificity of purified a v (3 3 
for different ligands is modulated by the lipid composition 
of the liposomes used for the assays (Conforti et al . , 1 990). 

Thus, the current view of leukocyte adhesion to endothe- 
lium at sites of inflammation invokes a multistep process 
involving initial unstable adhesion mediated by selectins 
(whose expression on the endothelium is induced by in- 
flammation), followed by activation of leukocyte p 2 inte- 
grins by inflammatory mediators (produced by the endo- 
thelium or underlying inflamed tissue), and finally, strong 
adhesion of the p 2 integrins to counterreceptors (also in- 
duced on the endothelial cells). Thus, there is an adhesive 
cascade leading to function of the p 2 integrins only in the 
appropriate places: the integrins provide the necessary 
strong adhesion, but the specificity comes from the 
involvement of multiple receptors and, crucially, from the 
activation steps (Butcher, 1991; Hynesand Lander, 1992). 

An analogous situation exists for T lymphocytes. Speci- 
ficity of their adhesion to antigen-presenting cells comes 
from the T cell receptor, which recognizes antigenic pep- 
tides bound to major histocompatibility molecules. How- 
ever, adhesion also relies on aj* 2 integrin (lymphocyte 
function antigen-1 , or LFA-1) and can be blocked by anti- 


bodies to this integrin. a L p 2 binds to ICAM-1 on the target 
cells, but this is not an antigen-specific interaction. It turns 
out that cross-linking of either the CD3 component of the 
T cell receptor or of the costimulating receptor, CD2, acti- 
vates a L p 2 on the T cells (Dustin and Springer, 1989; van 
Kooyk et al., 1989). Activation via CD3 is transient, allow- 
ing both adhesion and deadhesion. Thus, strong, antigen- 
specific adhesion is again mediated by an adhesion cas- 
cade: weak but specific adhesion via the T cell receptor- 
CD3 complex triggers activation of a L p 2 , leading to strong 
adhesion. As in leukocytes, the integrin provides the 
adhesive strength, but the activation steps provide the 
specificity. 

How is this activation accomplished? It is known that 
CD2 and CD3 cross-linking can activate protein kinase C 
in lymphocytes, and phorbol esters readily activate a L p 2 in 
lymphocytes (Dustin and Springer, 1989; van Kooyk et al., 
1989), just as in leukocytes. These results suggest the 
involvement of protein kinase C in the activation pathways 
in both cell types. The transience of the activation medi- 
ated by T cell receptor-CD3 suggests that both activation 
and deactivation mechanisms exist, and treatments that 
elevate cAMP abrogate the T cell receptor-CD3-mediated 
activation of a L p 2 (Dustin and Springer, 1989). 

Recombinant DNA expression experiments are provid- 
ing some information about possible sites of regulation, 
although as for aubP 3 , it is not yet possible to discern the 
details. In contrast with ant>p 3l a L p 2 expressed in heterolo- 
gous cells is constitutively active (Larson et al., 1991). 
When transfected into LAD mutant lymphoblastoid lines, 
recombinant ckP 2 exhibits phorbot-induced activation for 
binding to ICAM-1 (Hibbs et al., 1991a). In partial confor- 
mity with results on aubPa, this binding activity is modulated 
by cytoplasmic domain sequences; deletions of the p 2 cy- 
toplasmic domain lead to expression of inactive receptor 
(Hibbs et al., 1991a). These inactive receptors can be par- 
tially activated by the activating monoclonal antibody NKI- 
L1 6 or by phorbol esters. Truncations of the a subunit have 
no effect. Therefore, as for aii b p 3 , the cytoplasmic domains 
of p 2 integrins are targets for regulatory events. These 
could include phosphorylation or binding of some cyto- 
plasmic component (cytoskeleton or other). Phorbol esters 
induce phosphorylation on serine residue(s) of the p 2 cyto- 
plasmic domain in monocytes and neutrophils (Chatila et 
al., 1989; Buyon et al., 1990; Valmu et al., 1991), but there 
is no direct evidence that this event affects function. In fact, 
mutagenesis of the p 2 cytoplasmic domains can dissociate 
phosphorylation from activation in lymphoid cells (Hibbs 
et al., 1991b). 

In addition to these relatively well-studied examples of 
integrin modulation, there are many other cases in which 
intracellular events affect the affinity of various integrins. 
Most are listed on the left-hand side of Figure 3 and will 
be mentioned briefly here, pi integrins are widely ex- 
pressed on lymphocytes and leukocytes (Hemler, 1990; 
Shimizu and Shaw, 1991). As a generalization, the levels 
of Pi integrins increase after antigen stimulation. As in 
other cell types, the p t integrins mediate attachment of 
lymphocytes to extracellular matrix proteins and likely play 
a role in extravasation and migration of activated tympho- 
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Figure 3. Signaling via Integrin Receptors 

The receptors undergo conformational changes between at least two states: inactive (closed symbol) and active (open symbol). Only in the latter 
state do they bind most of their ligands. Signaling via integrins takes two forms: regulation of the affinity and conformation of the receptor from inside 
the cell (inside-out signaling), and triggering of intracellular events by tigand occupation of the receptors (out side-in signaling). 


cytes in tissues during immune responses (e.g., Ferguson 
et al., 1991). cufli, which recognizes VCAM-1 on activated 
endothelial celts as well as an alternatively spliced seg- 
ment of fibronectin, is involved in attachment of lympho- 
cytes to endothelial layers under some conditions (B. R. 
Schwartz et al., 1990; Shimizu et al., 1991). 

Activation of T cells by antigen or by phorbol esters leads 
to activation of a 2 p t , cu(3i, a 5 f*i, and a 6 0i without changes 
in surface levels (Shimizu et al., 1990b; Chan et al., 1991; 
Wilkins et al., 1991). These changes in activity are over 
and above the changes in surface levels that occur on 
conversion from naive to memory cells and lead to in- 
creased adhesion of the activated T cells to collagen , f ibro- 
nectin, and laminin. Therefore, it seems clear that inte- 
grins in lymphocytes undergo activation in a fashion 
similar to that described above for a L p 2 . 

In contrast with these activation events, there are sev- 
eral reported cases in which integrins lose activity during 
development but persist on the surface. This is so for a 5 fJi 
in teratocarcinoma cells (Dahl and Grabel, 1989) and in 
keratinocytes (Adams and Watt, 1990) and for a 6 |Ji in reti- 
nal neurons (Neugebauer and Reichardt, 1991). Interest- 
ingly, loss of asPi activity in teratocarcinoma cells corre- 
lates with loss of phosphoserine labeling of the integrin, 
although there is no evidence for causal linkage. 

There are two reported instances in which phosphoryla- 
tion of Pi integrins is associated with apparent inactivation 
of the receptors. The first is during oncogenic transforma- 
tion by Rous sarcoma virus; pp60" c phosphorylates a tyro- 
sine residue in the Pi subunit and this appears to reduce 
binding of 3i integrins to both talin and fibronectin (Tapley 
et al.. 1989). Phosphorylation of (J1 by pp60 OT is absent in 


cells transformed by a virus mutant in pp60" c that fails to 
induce cell rounding and loss of fibronectin (Horvath et al., 
1990), supporting the correlation of tyrosine phosphoryla- 
tion of 3i with inactivation. The second potential inactivat- 
ing phosphorylation is of a serine two residues from the 
tyrosine phosphorylated by ppBO* 0 . This serine becomes 
phosphorylated in mitotic cells and the a 5 Pi from such celts 
no longer binds to fibronectin, consistent with the rounding 
and detachment of cells during mitosis (C. Grandori and 
R. Hynes, unpublished data). 

Therefore, both activation and inactivation of various 
integrins can be mediated from within cells. The exact 
mechanisms are not fully elucidated, but presumptive evi- 
dence exists in several cases, implicating interactions with 
the cytoplasmic domains of integrin subunits. As men- 
tioned earlier, the cytoplasmic domains of different inte- 
grins differ substantially in sequence and in several cases 
can be alternatively spliced. The potential modifications 
and interactions of these cytoplasmic domains with cy- 
toskeletal and regulatory components should be a fruitful 
area for research in the next couple of years. So much 
for our current understanding of inside-out signaling via 
integrins. What about the possibility that integrin receptors 
can transmit signals into cells? 

Do Integrins Transmit Signals into Cells? 

There is increasing evidence that integrins do mediate 
information transfer into cells. The principal processes in 
which integrin-mediated signaling has been implicated are 
listed on the right-hand side of Figure 3. 
In blood platelets, there is good evidence that tyrosine 
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phosphorylation events occurring upon platelet activation 
require ligand (i.e., fibrinogen) occupation of the major 
atibfc integrin for full expression (reviewed in Shattil and 
Brugge, 1991). Platelets can be activated by a variety of 
agonists, including thrombin, epinephrine, and ADP, all of 
which act via seven-transmembrane-helix receptors cou- 
pled to Q proteins (Manning and Brass, 1991). Another 
platelet agonist, collagen, binds to an integrin receptor 
a 2 Pi (Staatz et al., 1989). The second-messenger path- 
ways activated by these and other agonists include several 
phospholipases, phosphatidyl turnover, elevation of cyto- 
plasmic pH and Ca", and activation of protein kinases. 
Platelets contain several protein tyrosine kinases, and ex- 
tensive tyrosine phosphorylation occurs on platelet activa- 
tion (Shattil and Brugge, 1991, and references therein). 
Absence of aimfc or blockade of fibrinogen interaction with 
the activated receptor by antibodies or peptides interferes 
with the tyrosine phosphorylation events (Ferrell and Mar- 
tin, 1989; Golden et ai., 1990). Ligand binding of a Mb p 3 
appears necessary but not sufficient for the phosphoryla- 
tion response. 

Several protein tyrosine kinases of the src family are 
associated with another adhesion receptor of platelets, 
GPIV or CD36, which is a receptor for thrombospondin 
and possibly collagen (Shattil and Brugge, 1991; Huang 
et al., 1991). During platelet aggregation, fibrinogen and 
thrombospondin interact, which could easily lead to coclus- 
tering of the two receptors in the plane of the membrane. 
The nature of the phosphorylated proteins and their role 
in subsequent activation events is unknown, but it seems 
reasonably clear that ligand binding of a^Pa integrin, 
GPIV, and possibly a 2 p t can contribute to signal transduc- 
tion events in platelets. One can perhaps view them as 
coreceptors, with the seven-transmembrane-helix recep- 
tors mediating the actions of soluble agonists of platelets. 

Tyrosine phosphorylation events triggered via integrins 
have also been described in KB carcinoma cells and NIH 
3T3 fibroblasts. Cross-linking of a 3 3i integrin in KB cells 
leads to transient tyrosine phosphorylation of protein(s) of 
115-130 kd (Kornberg et al., 1991), and adhesion and 
spreading of NIH 3T3 cells on f ibronectin or on anti-integrin 
antibodies leads to rapid tyrosine phosphorylation of a pro- 
tein of similar size (Guan et al., 1991). Both are probably 
the same protein as one described by Kanner et al. (1 990) 
as a tyrosine-phosphorylated protein in ppGO^-trans- 
formed cells. They may also be related to proteins of simi- 
lar size that are phosphorylated after stimulation of cells 
with various soluble growth factors (Rees-Jones and Tay- 
lor, 1985; Sadoul et al., 1985; Pasquale et al., 1988). If, 
indeed, all these proteins are related, it would suggest 
convergence of the signaling pathways triggered by solu- 
ble growth factors, extracellular matrix adhesion recep- 
tors, and tyrosine kinase oncogene products. That, in turn, 
could offer potential explanations for the anchorage de- 
pendence of growth of normal cells and its loss in trans- 
formed cells (Guan et al., 1991). 

The stimulation of tyrosine phosphorylation via integrins 
in platelets and adherent cells is of some interest, since 
tyrosine phosphate is found concentrated at cell-substra- 
tum and cell-cell contact points (Maher et al., 1985; Tsu- 


kita et al., 1991). Recognition of tyrosine-phosphate sites 
by proteins containing SH2 (src homology 2) domains 
(Koch et al., 1991) could contribute to the associations of 
structural and regulatory proteins into the submembra- 
nous cytoskeletal structures at such points of cell contact. 

A second, well-established cytoplasmic event triggered 
via integrins is cytoplasmic alkalinization. Adhesion of fi- 
broblasts, endothelial cells, and lymphocytes to fibronec- 
tin causes elevation of cytoplasmic pH, which correlates 
with the parallel stimulation of spreading and growth 
(Schwartz etal., 1989, 1991a; Ingberetal., 1990). Further- 
more, by attaching the fibronectin or antibodies against a 5 
or p t integrins to beads, it can be demonstrated that the 
effect on cytoplasmic pH is integrin mediated and can be 
uncoupled from effects on spreading or growth (Schwartz 
et al., 1991b). Since constitutively elevated cytoplasmic 
pH correlates with anchorage independence of growth in 
transformed cells (M. A. Schwartz et al., 1990), one can 
postulate, as above for tyrosine phosphorylation, that cell 
adhesion signals transmitted via integrins converge with 
those triggered by soluble growth factors and oncogenes. 
The parallels between the platelet and adherent cell sys- 
tems are extended by the report that binding of fibrinogen 
to a llb f} 3 is necessary for the rise in cytoplasmic pH trig- 
gered in platelets by epinephrine (Banga et al., 1986). 

Therefore, in a variety of cell types, occupation of inte- 
grin receptors by their ligands leads to tyrosine phosphory- 
lation and cytoplasmic alkalinization. A reasonable work- 
ing hypothesis for both these cytoplasmic events (which 
may indeed be connected) is that integrin receptors syner- 
gize with receptors for soluble agonists in stimulating 
these signals (see below and Figure 4). 

Evidence for integrins as costimulatory receptors is also 
available for T lymphocytes. Ligand engagement of sev- 
eral integrins (a 3 pi, oupi, a 5 Pi, a 5 Pi, and a L p z ) can act as 
a costimulus with cross-linking of the T cell receptor-CD3 
receptor on T cells in stimulating cell proliferation (Matsu- 
yamaetaL, 1989; Shimizuetal., 1990a; Davis et al., 1990; 
Nojima et al., 1990; van Seventer et al., 1990; Burkly et 
al., 1991). Furthermore, fibronectin binding to a 5 Pi has 
been shown to induce the AP-1 transcription factor neces- 
sary for IL-2 transcription (Yamada et al., 1991), and a v p 3 
has been identified as an accessory costimulator of y6 T 
cells for IL-4 production (Roberts et al., 1991). Therefore, 
extracellular matrix and cell adhesion can have major ef- 
fects on the activation of T cells, and these effects are 
mediated via integrins— clearly these are important recep- 
tors on lymphocytes, where they play a number of roles 
(Hemler, 1990; Shimizu and Shaw, 1991). 

Integrins can also act as stimulatory receptors for mono- 
cytes and neutrophils. Adherence of monocytes to extra- 
cellular matrix molecules induces genes encoding inflam- 
matory mediators (Thorens etal., 1987; Spornet al., 1990). 
Adherence of neutrophils via p 2 integrins acts as a costimu- 
lus with cytokines for induction of the respiratory burst 
(Nathan et al., 1989), and this adhesion also induces cell 
motility and Ca ++ transients in the cytoplasm (Jaconi et al., 
1991; Ng-Sikorski et al., 1991). 

From all these results, it is clear that integrins can act 
as true signaling receptors in a variety of cell types. This 
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Figure 4. Integrins as Generators and Receivers of Signals 
The figure summarizes both data and hypotheses discussed in the 
text. In each of the three celt types depicted, integrins have been shown 
to trigger Intracellular signals, often in synergy with other receptors. 
The integrins are depicted as sending signals (thin black arrows) into 
the second-messenger pathways of the cells (black boxes). These 
signals converge with those from Q protein-coupled agonist receptors 
or protein tyrosine kinase-coupled receptors. The consequences of 
ceil activation include activation of specific integrins and thus en- 
hanced cell adhesion (large grey arrows) and other responses, such 
as cell proliferation, secretion, and morphological change (large black 
arrows). 

(a) Platelet. Fibrinogen and collagen each bind to integrins (ai&fo and 
a 3 p,) and can act as agonists. Both ligands also interact with thrombo- 
spondin, whose receptor GPIV/CD36 has protein tyrosine kinases 
(black dots) associated with its cytoplasmic domain. Clustering of the 
surface receptors is necessary for triggering tyrosine phosphorylation 
and may facilitate interactions of kinases with their substrates. It is 
proposed that one consequence of all the signals is activation of ant.(J 3l 
leading to strong cell adhesion. 

(b) Lymphocyte. Several integrins can act as costimulatory receptors 
(for signal 3) with the T cell receptor-CD3-CD4,8 receptor complex 
and its associated kinases (black dots) leading to T cell activation, the 


conclusion offers the potential for explaining results in a 
wide variety of systems, in which it has been shown that 
Hgand or antibody binding of specific integrins affects gene 
expression or differentiation of specific cell types. These 
include induction of specific protease genes in synovial 
fibroblasts via a 5 Pi (Werb et al. , 1 989), inhibition of terminal 
keratinocyle differentiation by fibronectin acting via a p, 
integrin, probably a 5 Pi (Adams and Watt, 1989), and modu- 
lation of myogenesis (Menko and Boettiger, 1987). 

Conclusions and Speculations 

From the foregoing discussion, it should be clear that inte- 
grins play many roles in many cells. The twenty or so 
known integrins (see table) offer the possibility of great 
versatility in cell adhesion, and this versatility is probably 
further increased by alternative splicing. What has be- 
come clear in the last few years is that integrins are not 
simply adhesion sites on cell surfaces. The activities of 
many integrins can be radically modulated by cells, and 
they in turn can modulate cell activities in ways that extend 
far beyond adhesion. 

The view of integrins as two-way signaling molecules is 
summarized in Figure 3. A particularly important feature 
of many (perhaps all) integrins is that they undergo activa- 
tion. It is commonly the case in an adhesion process that 
integrins provide the strong adhesion but only after activa- 
tion by other stimuli, which can include soluble mediators 
(hormones, cytokines, etc.) and/or solid-phase reactants 
(extracellular matrix or other cells). The specificity of the 
overall adhesion event lies in the coupling of activation of 
the final adhesion receptor, often an integrin that is not 
intrinsically highly specific, to a cascade of signals trig- 
gered by specific and/or local events. We discussed four 
examples: amfc in platelets, a L p 2 on lymphocytes, p 2 inte- 
grins on leukocytes, and 0i integrins in lymphocytes. In 
each case, it is an eminently reasonable model that the 
adhesion via the integrin is activated at the appropriate 
time and place by input from more specific signals (respec- 
tively, thrombogenic agonists, antigen, se lectins plus cy- 
tokines, and T celt activation, in the four cases mentioned). 

Of equal importance with activation of integrins is their 
inactivation. It is crucially important that cells should not 
attach at the wrong times and places. Platelets and leuko- 
cytes offer two prime examples in which inappropriate ad- 
hesion leads to thrombosis and inflammation, respec- 
tively. Thus, the constitutively inactive state of the integrins 


consequences of which include activation of integrins for strong cell 
adhesion to other cells or to extracellular matrix, 
(c) Fibroblast. Extracellular matrix molecules, acting in part through 
integrins, stimulate tyrosine phosphorylation inside the cell, as do 
growth factor receptors and, in transformed cells. PP60**. These sig- 
nals may converge or synergize to produce cell responses, such as 
proliferation. Synergy between soluble growth factors and matrix adhe- 
sion would give anchorage dependence of growth. Replacement of the 
need for the extracellular matrix signal by a src kinase signal could 
lead to anchorage independence of transformed celts. The multiple 
domains of extracellular matrix proteins could cluster receptors to gen- 
erate combined signals (denoted 1 ) analogous to those generated by 
clustering receptors in T cells (see part b). 
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on these cells is vital. Similarly, attached cells need on 
occasion to detach (e.g., during cell migration or mitosis). 
Less is known about these inactivation events, but hints 
are beginning to appear that they may be regulated by 
phosphorylation. 

Apart from modulation of affinity, the second major prop- 
erty of integrins reviewed here is their role as signaling 
receptors. Recent data have uncovered striking parallels 
in a number of systems (see Figure 3 and earlier discus- 
sion). Figure 4 brings together data on several cell systems 
to allow some parallels to be drawn and some speculations 
offered. In all three cell types depicted, there is evidence 
that integrins can act as signaling receptors. In most cases 
they act as coreceptors with other more traditional recep- 
tors, such as the G protein-coupled receptors for platelet 
agonists or the protein kinase receptors for growth factors 
in fibroblasts. The signaling events attributed to integrins 
parallel and synergize with those due to the soluble li- 
gands. This view of integrins as coreceptors fits equally 
well the results in T lymphocytes, where the antigen- 
specific T cell receptor (T cell receptor-CD3-CD4,8 com- 
plex) is well known to be costimulatory with CD2. Recent 
data show that various integrins can also serve as corecep- 
tors with the T cell receptor-CD3-CD4 complex. 

Thus, in each of these cell types, one can propose a 
set of signals converging on a common set of regulatory 
circuits inside the cells (the black boxes in Figure 4). The 
details of these circuits are unclear but, in each case, ap- 
pear to include activation of phospholipases, phosphati- 
dylinositol turnover, cytoplasmic alkalinization, elevation 
of intracellular Ca + \ and activation of protein kinase C 
(All three cell types in Figure 4 as well as leukocytes can 
be activated by phorbol esters.) In each case, ligand occu- 
pancy of integrins can trigger signals that feed into this 
circuitry and parallel and/or complement the signals from 
the other receptors. 

How do integrins signal? As discussed earlier, there is 
evidence for activation of protein tyrosine kinases as a 
consequence of cross-linking of integrins by ligands (fibrin- 
ogen or fibronectin) or antibodies. The nature of the ki- 
nases involved is unknown, although in platelets, kinases 
of the src family are associated with another adhesion 
receptor, GPIV/CD36 (Huang et al., 1991; Shattil and 
Brugge, 1991). Thrombospondin, the ligand of GPIV/ 
CD36 is known to cross-link with fibrinogen and collagen, 
both ligands for integrins, at the surfaces of aggregating 
platelets. Thus, it is reasonable to propose that cross- 
linking of these adhesion receptors brings them together, 
clustering protein tyrosine kinases in a submembranous 
patch where they become activated and/or react with their 
substrates. 

There is striking similarity of this model with the current 
model of T cell activation via the T cell receptor-CD3- 
CD4 complex (Rudd, 1 990; Klausner and Samelson , 1 991 ; 
Shaw and Thomas, 1991). Here it is thought that protein 
tyrosine kinases of the src family associated with the short 
cytoplasmic domains of the T cell receptor and CD4/8 are 
brought together with CD3 by interaction with the complex 
of antigen and major histocompatibility complex molecule 
on an antigen-presenting cell, leading to subsequent sig- 


naling events. I should stress here that there is as yet no 
published evidence for tyrosine kinases associated with 
the short cytoplasmic domains of integrins, but it would 
certainly be worthwhile looking for them, given the evi- 
dence for tyrosine phosphorylation triggered by integrins 
in several cell types. Also of interest is the 50 kd integrin- 
associated protein that appears to be involved in integrin- 
mediated activation of phagocytosis by leukocytes (Brown 
et al., 1990). 

A final speculation is stimulated by the parallels with the 
T lymphocyte system. It concerns the nature of extracellu- 
lar matrix proteins with their modular structure and multi- 
ple sites for interactions with cell surface receptors, includ- 
ing but not limited to integrins. It has been a puzzling 
question as to why these molecules have so many distinct 
and different sites for interacting with cells. Could it be that 
they are designed to cross-link several different surface 
receptors together in the plane of the membrane, thus 
inducing an organized patch of submembrane associates 
that could interact to generate cytoskeletal structures and/ 
or to trigger signaling events? 

The latter proposal is schematized for fibroblasts in Fig- 
ure 4c, in which an extracellular matrix molecule (e.g., 
fibronectin) is depicted to interact both with an integrin and 
with a proteoglycan. The two receptors together generate 
a signal in a fashion analogous with the signal(s) generated 
on clustering of T cell receptor-CD3-CD4 in T cells (Figure 
4b). One could view the extracellular matrix molecule as 
the equivalent of the antigen-presenting cell. Evidence ex- 
ists that segments of fibronectin containing multiple do- 
mains trigger greater tyrosine phosphorylation than do 
simpler cell-binding domains (Guan et al. , 1 991 ). Similarly, 
larger domains promote more cytoskeletal organization 
(Obara et al., 1988; Woods et al., 1988). Work in progress 
on various matrix molecules will test this sort of model in 
detail. 

The parallels among the different systems that I have 
stressed here are undoubtedly more complex than the 
present discussion implies. Equally, each system has indi- 
vidual features that I have underemphasized, in order to 
make some generalizations and propose some testable 
hypotheses concerning functions of integrins in cells. 
When the integrin receptor family was first recognized sev- 
eral years ago, some people questioned whether integrins 
were "true" receptors or whether they were "simply" in- 
volved in adhesion. The results in the past few years have 
demonstrated clearly that integrins are indeed receptors, 
in the sense of transmitting signals both into and out of 
cells. Furthermore, it has become clear that there is no 
such thing as simple adhesion. Rather there is a versatile 
and complex array of interactions, modulations, and sig- 
naling events in which integrins play a central role. 
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Cell adhesion: 

old and new questions 

Richard 0. Hynes 

Metazoans clearly need cell adhesion to hold themselves together, but adhesion does much more than that. 
Adhesion receptors make transmembrane connections, linking extracellular matrix and adjacent cells to the 
intracellular cytoskeleton, and they also serve as signal transducers. In this article, I briefly summarize our 
present understanding of the molecular basis and biological consequences of cell adhesion and discuss how our 
current knowledge sheds light on questions of specificity of cell adhesion. I offer some thoughts and speculations 
about the evolution of cell-adhesion molecules and processes, consider their inter-relationships with other forms of 
cell-cell communication and discuss unresolved questions ripe for investigation as we enter the postgenomic era. 



Even a cursory consideration of metazoan anatomy and 
development forces the realization that the associations of 
cells in epithelia, their attachment to basement membranes 
and the migrations of cells and projections of neurons all require 
selective adhesion of cells to one another and to extracellular 
matrices (ECMs). Recognition of this requirement led to a spirited 
debate between proponents of a large number of highly selective 
adhesion receptors, and advocates of models in which quantita- 
tive differences in adhesive strength, without necessarily a large 
spectrum of individual specificities, were invoked to explain dif- 
ferential cell adhesion. Similarly, the phenomenon of induction, 
in which one tissue influences the developmental fare of adjacent 
tissues, clearly relies on cell-cell interactions, and experimental 
embryologists attempted to define whether induction relies on 
diffusible signals or on cell-cell or cell-matrix contacts. Neither 
rhe issue of specificity of cell adhesion nor the question of the 
mechanistic bases of induction could be resolved without mol- 
ecular biology. Now, with the benefit of a couple of decades of 
molecular analysis, we can see that there is some truth to all of the 
earlier models. The specificity of cell adhesion comes from com- 
binatorial expression and interactions among a large, but not 
unlimited, number of adhesion receptors, and induction relies on 
diffusible ligands binding to receptors, on cell-cell contacts and 
on cell-matrix adhesion. The distinctions among these three 
mechanisms are not actually that great - adhesion receptors 
signal much like receptors for growth factors and should be 
considered in parallel with them. 


Before considering the biological functions of cell adhesion, we 
need to define the players. Figure I in Box 1 diagrams the struc- 
tures of representative cell-cell adhesion receptors. Fortunately, 
many adhesion receptors fall into a relatively small number of 
families, the major ones being shown in Fig. I. Other families of 
adhesion receptors, such as syndecans and other membrane- 
bound proteoglycans, the disintegrin family and others are less 
well understood at this time. In addition to their roles in binding 
cells to their neighbours (Fig. 1) or to ECM (Fig. 1), engagement 
of cell-adhesion receptors has major effects on many aspects of 
cell behaviour - cell shape and polarization, cytoskeletal organiz- 
ation, cell motility, proliferation, survival and differentiation. 
How do they accomplish all these functions? 

Cytoskeletal connections 

Crucial to the effects of adhesion receptors on intracellular 
organization and cell motility is the fact that their cytoplasmic 
domains connect to the cytoskeleton. Figure la shows how 
integrins bind to linker proteins, which in turn make direct 
and indirect connections to F-actin filaments, thus establishing 
a mechanical link between the fibrils of the ECM and the 
filaments of the cytoskeleton 9 ' 10 . The connection of classic 
cadherins to the actin cytoskeleton that occurs at cell-cell junc- 
tions is analogous, although the molecules involved are different 
(Fig. 2a) 1,11,12 . Although integrins appear to be the major recep- 
tors for ECM, they are nor the only ones. One well-studied 
example, of considerable interest because of its involvement in 
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BOX 1 . Major classes of cell-adhesion receptors 
(a) Cadherins 

Cadherins are primarily and centrally involved in cell-cell adhesion 
(Fig, I). The so-called classic cadherins (shown) currently number -20 
in vertebrates 1 . Their extracellular, domains contain five charac- 
teristic cadherin repeats, each comprising a sandwich of p sheets. 
Cadherins mediate Ca 2+ -dependent homophUic (like-with-like) adhesion 
between cells through the most distal cadherin repeats. Classic 
cadherins share homologous cytoplasmic domains that link to the 
actin cytoskeleton. Both structural and functional analyses suggest 
that the functional unit is a dimer as shown. As for other adhesion 
receptors, clustering of cadherins is important for their functions, 
and multiple dimer-dimer interactions are believed to provide suf- 
ficient local avidity" to mediate cell-cell adhesion. Desmosomal 
cadherins (desmocoltins, desmogleins), although related to classic 
cadherins in their extracellular domains, have distinct cytoplasmic 
domains that link to intermediate filaments. Other subclasses of 
the cadherin superfamily are known as protocadherins 2 - 1 , and these 
typically have six cadherin repeats! Unlike classic and desmosomal 
cadherins, each of which is encoded by a separate genetic locus, 
protocadherins appear to be encoded by complex genetic loci with 
multiple (15-22) tandem exons each encoding one entire extracel- 
lular and transmembrane domain upstream of a single common cyto- 
plasmic domain 3 . Each protocadherin subfamily is encoded by one 
such complex locus, but the mechanisms by which individual family 
members are generated remain unclear. 



Plasma Plasma 


membrane membrane 

TCB'TIBS'TIG 

(b) Immunoglobulin superfamily 

The second major class of adhesion receptors comprises the 
immunoglobulin superfamily (Ig-SF), characterized by the presence 
of varying numbers of Ig-related domains 4 . Like cadherin domains, 
these are sandwiches of two p sheets held together by hydrophobic 
interactions. This is a stable structure that occurs also in another 
domain common among adhesion molecules: fibronectin type ill (Fn3) 
domains (boxes), which frequently occur in tandem with Ig domains 


(circles) in cell-adhesion receptors. Fn3 domains also occur in 
adhesive proteins of the extracellular matrix (ECM) such as fibro- , 
nectin and tenascin and in the ligand- binding domains of cytokine 
receptors. Since homologous Ig/Fn3 receptors occur in insects, 
nematodes and vertebrates, this arrangement is clearly evolution- ! 
arily ancient. Indeed, these two domains probably originated in the 
context of cell- adhesion receptors early in metazoan evolution; 
their later appearance in immunoglobulins and fibronectin appears 
restricted to chordates. 

The Ig superfamily is diverse, numbering well over 100 members 
in vertebrates. In addition to adhesion receptors containing both Ig 
and Fn3 repeats such as N-CAM (b), numerous molecules with one or 
more Ig domains play roles in cell-cell interactions in the immune 
system and elsewhere. Different Ig-SF members participate in hemo- 
philic interactions, as shown here for N-CAM, or in heterophilic . 
interactions with other Ig-SF members, with integrins [see panel (d) , 
and below] or with ECM proteins (e,g. DCC-netrins, see article by 
Tessier-Lavigne and Goodman in this issue). Where they have been 
mapped, the interaction sites typically are in the distal Ig domains. . 
There are fewer data on dimerization, clustering and cytoskeletal , 
connections than for cadherins, although some evidence suggests that ' 
such interactions also contribute to the functions of Ig-SF receptors. 

(c) Selectins 

Another well-studied group of cell adhesion receptors comprises 
the selectins and their counter- receptors 5 - 6 . The figure shows 
a heterophilic interaction between a selectin (P selectin) and 
its counterreceptor, a heavily glycosylated protein (PSGL-1). 
Binding is through the C-type lectin domain (pink) in the selectin, 
which recognizes specific carbohydrate groupings in the counter- 
receptor/ ligand. 

Unlike cadherins and Ig-SF members, which are evolutionary 
ancient and widely expressed, selectins are currently known only in ! 
cells of the vertebrate circulation (endothelium and blood cells), : 
although other lectins are widely distributed. Given the great j 
potential for specificity that lies in carbohydrate structures, it : 
seems likely that additional carbohydrate-specific receptors, such 
as galectins and the C-type lectins expressed by natural killer cells, 
will be increasingly recognized to be important. 

Selectins and their ligands play a crucial role in the adhesion of , 
leukocytes to endothelium, where their cooperation with integrins 
and Ig-SF receptors is one of the best- understood examples of cell- 
adhesion specificity, which arises from tightly regulated display 
and interaction among a limited number of receptors 5,6 . 

(d) Integrins 

The final major family of adhesion receptors is the integrins 7 - 8 . ! 
Unlike all the others, these are heterodimers. In mammals, there are 
genes for eighteen a and eight p integrins; many a-p combinations 
fail to occur but at least two dozen are well defined. Most integrins 
are predominantly or exclusively receptors for ECM proteins such as 
fibronectins, laminins and collagens (Fig. 1a), but a few also play 
important roles in heterotypic cell adhesion, most notably of 
leukocytes, where they bind to counter- receptors of the Ig super- 
family (ICAMs, VCAM-1, MAdCAM-1) or, in one case, a cadherin 
(aEp7-E-cadherin). The figure shows a heterophilic interaction 
between an Ig-SF receptor (ICAM-1) and an integrin; the binding 
site is in the distal Ig repeats in ICAM-1 and partakes of both sub- 
units in the integrin. Integrins play a central role in cell adhesion to 
basement membranes, in the polarization of cells induced by that 
adhesion and in cell migration upon and through ECM. 


muscular dystrophies, is the dystroglycan complex, which con- 
nects dystrophin/actin inside the cell to laminin and/or agrin in 
the extracellular matrix (Fig. lb) 13 . Although studied most exten- 
sively in muscle cells, analogous dystroglycan complexes clearly 
function in other cells. 


Transmembrane structural connections, as shown in Fig. 1 and 
also demonstrated for other adhesion receptors [e.g. hyaluronan/ 
CD44/ezrin— radix] n-moes in (ERM) proteins], appear to be a com- 
mon feature. There are preliminary indications that some immuno- 
globulin superfamily (Ig-SF) receptors also make cytoskeletal 
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FIGURE 1 . Transmembrane connections between the extracellular matrix (ECM) and the cytoskeleton. (a) Integrins (a5p1 , a6fJ1 ) comprise the 
major receptors for ECM proteins, such as fibronectin (pink) and laminins (green), as shown here. Their extracellular domains bind to specific 
sites in the ECM proteins. Their cytoplasmic domains bind to submembranous cytoskeletal proteins such as talin (yellow) and a-actinin (lilac) 
and, through them, to other linkers such as vinculin (brown) or to actin microfilaments (grey). Additional cytoplasmic proteins (red) are also 
recruited; many of these function in signalling (see Fig. 2b). (b) Dystroglycan (ap) together with sarcoglycans (blue) form another transmembrane 
link to laminin or to agrin (not shown) and bind via dystrophin (black) to actin filaments. Classic cadherins also link to the actin cyto- 
skeleton via catenins (see Fig. 2a). For both integrins and cadherins, variant members of the families with divergent cytoplasmic domains 
(a6fi4 integrin, desmogleins and desmocollins) connect instead to intermediate filaments via desmoplakins and other linker proteins. 


connections (e.g. N-CAM/fodrin, ICAMs/ERM proteins) and that 
selectins or their counter- receptors might make similar connections 
that lead to their clustering on microvilli. The connections to the 
cytoskeleton affect not only intracellular organization but also cell 
adhesion itself. The adhesive functions of integrins and cadherins 
depend upon these cytoskeletal connections. Some of this depend- 
ence is presumably related to the clustering necessary to provide 
sufficient local avidity for stable cell adhesion. However, at least 
for integrins and possibly for other adhesion receptors, there can be 
more to it than that. Connection to the cytoskeleton can 'activate* 
integrins, changing their conformation and increasing their ability 
to bind to ligands. This ability to control the affinity and/or avidity 
of integrins is crucial to proper cell adhesion and is known as 
'inside-out' signalling 714 . As we will see in the next section, integrins 
and cadherins are in fact two-way signalling receptors, and the 
same might be true for most adhesion receptors. 

Signal transduction by adhesion receptors 

A fundamental advance in the past decade has been the demon- 
stration that cell-adhesion receptors transduce signals. This is best 
understood for integrins, which display a repertoire of signal- 
transduction capabilities at least as diverse as most growth -factor 
receptors (Fig. 2b) 14 * 16 . Their effects include activation of Rho- 
family GTPases leading to changes in cytoskeletal organization, 
activation of mitogen -activated protein (MAP) kinase pathways 
and activation of an array of protein and lipid kinases. These sig- 
nalling pathways allow integrins to influence cell-cycle progression, 
cell survival and gene expression in addition to their effects on cell 
adhesion and morphology. In fact, most cells will not proliferate or 
survive unless they are adhering to a substrate — so-called anchorage 
dependence. Provision of soluble growth factors such as epidermal 
growth factor (EGF) or platelet-derived growth factor (PDGF) is 
not sufficient; input from integrin signalling is also necessary, and 
there is considerable crosstalk and cooperation between integrins 
and growth-factor receptors. This cooperation occurs at many 


levels, ranging from membrane-proximal interactions, in which 
the different types of receptor influence each other's activity, to 
multiple inputs into common pathways. Indeed, it is not realistic 
to consider either adhesion receptors or growth- factor receptors 
separately - they are part of an integrated system. 

This integration is clearly demonstrated by the cadherin/ 
[J-catenin system 11,12 . (J-catenin is a cytoskeletal connector of classic 
cadherins, but it is also a central player in signal transduction, 
functioning as a transcriptional activator whose levels are elevated 
in response to Wnt signalling (Fig. 2a). The interplay between 
cell-cell adhesion and the Wnt signalling pathway is complex, with 
each affecting the other, just like the interplay between integrins 
and tyrosine kinase receptors. Other members of the cadherin 
superfamily presumably affect different signalling pathways; 
protocadherins fall into subfamilies, each with a distinct cytoplas- 
mic domain, and one protocadherin subclass was first identified 
by its interactions with the Src-family kinase Fyn 2 . 

It is also becoming clear that integrins, at least, do not signal by 
themselves; they are frequently associated with accessory trans- 
membrane molecules (tetraspanins, CD47, caveolin, syndecans) that 
contribute to the diversity of their signalling capacities' 7 . It is possible 
to draw an analogy with the well-analysed T- and B-cell receptors and 
their multiple associated signalling molecules' 8,19 . There are also 
indications that other adhesion receptors function as constituents of 
complexes involving multiple signalling molecules. One can readily 
extrapolate from the current data and postulate that most or all 
signal transduction relies on associations among multiple receptors, 
including both adhesion receptors and receptors for soluble ligands. 

A receptor continuum: soluble ligands to ECM to 
cell-cell contact 

There is, in fact, little or no justification for drawing a distinction 
between adhesion receptors and receptors for soluble ligands; both 
signal, often affecting the same signal- transduction pathways. Indeed 
many soluble' growth factors often do not function as truly soluble 
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FIGURE 2. Signalling mediated by adhesion receptors, {a) Classic cadherins bind to 0-catenin through their cytoplasmic domains. 3-catenin can link via a-actinin 
to the actin cytoskeleton or it can bind to a large protein complex containing adenomatous polyposis coli (APC) and the serine/ threonine kinase glycogen 
synthase kinase 3£ (GSK3p). The latter phosphorylates p-catenin, targeting it for degradation by the proteasome. Wnt binding to its receptor, Frizzled (Frz), 
leads to inhibition of GSK3p, allowing p-catenin to accumulate and bind to the transcription factor Lef-1/TCF. The p-catenin-Lef-1 complex moves to the 
nucleus and activates transcription. Thus, the balance between cadherin association, degradation and Wnt signalling controls the level of p-catenin-Lef-1. 
(b) Integrins activate a large array of signalling intermediates, including small GTPases (red), protein kinases (green), cytoskeletal proteins (yellow) and others. 
Acting through these intermediates, which can also be activated by various growth-factor receptors, integrins can greatly affect many biological responses (blue 
boxes). Abbreviations: FAK, focal-adhesion kinase; MLCK, myosin light chain kinase; PI3K, phosphoinositide 3- kinase. 


molecules. Many (transforming growth factor (3, fibroblast growth 
factors, Wnts, Hedgehogs) bind in one way or another to the ECM 
and are presented to their signal-transduction receptors as insoluble 
mediators. The whole concept of morphogenetic gradients incorpo- 
rates the idea that morphogens are both soluble and anchored. So 
the boundary between soluble ligands and ECM ligands is blurred. 
Similarly, receptors that mediate cell— cell contacts such as the 
T-cell receptor' 8 have much in common with those binding soluble 
or bound antigen or antibody (B-cell receptor, Fc receptor) 19 . 
Receptor pairs such as the eph/ephrin 20 and Notch/Delta/Serrate 
families 21 , SevenJess/Boss and receptor tyrosine phosphatases 22 all 
share domains and signal-transduction mechanisms, or both, 
with growth factors, ECM or classical growth -factor receptors. In 
some cases, these receptors have been shown to mediate cell-cell 
adhesion. In other words, there is considerable commonality of 
evolution and function among the different types of receptors. 

If we return to the question of embryonic induction first raised 
70 years ago by experimental embryologists and reconsider the 
debate as to whether induction relies on soluble factors, extracel- 
lular matrix or cell-cell contact, that question now seems somewhat 
moot. The answer is that all three can, and typically do, contribute, 
but they are part of a continuum, and all feed into a common 
network of intracellular signals with much synergy and crosstalk 
among them. A major challenge ahead of us is to understand the 
integration of all these inputs to generate coherent responses. 

Where do we stand and where do we go from here? 

Given what we now know about adhesion receptors, what can we 
say about the specificity of cell adhesion? Is it due to a very large 
number of receptors, sufficient for example to confer identity on 
each retinal axon or synapse? How many adhesion receptors are 
there in the genome? With the sequence of the first metazoan 
genome, that of Caenorhabditis elegans [see articles in Science 
(1998) 282, 201 1-2046], we can begin to answer these questions 
- some of the answers are surprising. 


One striking result from the C. elegans sequence is the discovery 
of a very large number of genes that encode ECM proteins. What 
are all these proteins for? They could serve purely structural roles 
or act as docking sites for presentation of growth factors, gradients 
of morphogens or chemoattractants. The ECM performs such 
functions in vertebrates, and even well-studied matrix proteins 
such as fibronectin, tenascin and agrin contain many highly con- 
served segments whose functions remain completely obscure. There 
is clearly a great deal that we do not understand about the func- 
tions of the ECM. Will a similar plethora of putative matrix pro- 
teins emerge from the fly and vertebrate genome sequences? There 
is every reason to believe that they will; the discovery of new 
matrix proteins continues apace even before the flood of genomic 
sequence data. One recent example is the discover)' of netrins 2J as 
axonal guidance molecules. The large number of matrix proteins 
is not matched by a large number of integrins. Does this mean 
that the integrins are very promiscuous, that other matrix recep- 
tors exist or that these putative matrix proteins do not interact 
directly with cells? 

There appear to be only two integrins in C elegans. Strikingly, 
the two integrin a genes appear related to two distinct subfam- 
ilies of vertebrate integrins, one that binds to laminins (a3, a6, 
a7) and one that binds to proteins containing the sequence 
RGD, such as fibronectin and vitronectin (a5, a8, av, atllb) 2i . 
Drosophila also contains clear representatives of each of these two 
integrin subfamilies 24 . Thus, these two subfamilies apparently 
evolved prior to the divergence of nematodes, arthropods and 
chordates. The same is true for laminin and type IV collagen, 
although not for fibronectin, which is absent from nematodes 
(and apparently also flies) and might be a vertebra re invention. Ir 
is plausible to argue that some very early metazoan evolved 
laminin and collagen to build a basement membrane and integ- 
rins for cells to attach to this membrane. During evolution, 
vertebrates have acquired multiple integrin genes. How many 
more will we find when the Drosophila, human and mouse 
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genomes are sequenced in the next few years? The limited reper- 
toire in C elegans might suggest nor many. On the other hand, 
the number of known cadherin/ protocadherin genes has more 
than doubled in just the past year with the application of human 
genomic analyses. This provides a glimpse of what might be just 
around the corner. 

The C. elegans genome has 18 genes that contain cadherin 
repeats; we already know of more than 70 in humans, and the 
number is rising fast. Why do we need so many more integrins 
and cadherins than worms do? One obvious suggestion might 
be the elaboration of our nervous system; many cadherins and 
protocadherins are expressed in the brain, apparendy differen- 
tially in different brain regions or in individual neurons 2,25 . 
Could they provide selectivity in neuronal or synaptic adhesion 
along the lines of the chemoaffiniry hypothesis proposed 60 years 
ago by S perry 26 ? Both classic and protocadherins, as well as 
integrins, are expressed at synapses 27 ' 29 . The recent discovery of 
multiple genes encoding protocadherins raises the exciting possi- 
bility that a large number of adhesion receptors confer synaptic 
selectivity. If these protocadherins can form heterodimers or 
heteromulrimers, then the number of potential combinations 
becomes very large 2 . The tantalizing organization of the proto- 
cadherin loci, with multiple variable exons and a common constant 
region is reminiscent of immunoglobulins or T-cell receptors 5 . 
There is currently no evidence for DNA rearrangements at these 
loci, although mutations in some genes responsible for repair of 
double-strand breaks lead to selective apoptosis of early post- 
mitotic neurons, encouraging speculation 30 . Even if DNA re- 
arrangements were to occur, there is as yet no sign of the multiple 
combinatorial variation seen in the immune system. Nonetheless, 
the existence of >50 genes for protocadherins (conceivably 2500 
heterodimers) offers a fair degree of variation. 

Our current picture of leukocyte adhesion to the endothelium 
offers a good example of how a high degree of specificity in cell 
adhesion can be generated using only a limited number of not 


particularly selective adhesion receptors 5 * 6 . Three selectins and their 
ligands, three to five integrins and five to six Ig-SF receptors appear 
to be sufficient to target leukocytes specifically to multiple sites dur- 
ing inflammation or lymphocyte trafficking. This selectivity relies 
on righdy regulated expression and, importandy, on activation of the 
integrins through crosstalk from selectins and chemokine receptors. 
The specificity therefore relies more on spatio temporal regulation, 
combinatorial expression and activation of several receptors than 
on the intrinsic specificity of individual receptors. 

Therefore, in considering how to explain the specificity of 
cell-cell adhesion, we have a fairly large number of receptors 
(hundreds), and we will soon know exactly how many. Combi- 
natorial display and the ability of these receptors to cooperate 
with each other and with 'classical' signalling receptors and to be 
fine-tuned in terms of their state of activation could provide 
enough potential spatio temporal specificity. The challenge now 
will be to exploit our knowledge of the list of players to understand 
the complexity of individual biological systems. 

While questions arising from developmental biology repre- 
sented one impetus to understand cell adhesion, others came from 
a desire to understand pathological processes. Altered adhesion 
properties were recognized early as a feature of cancer cells, and the 
tightly regulated adhesion of blood cells is central to haemostasis, 
thrombosis, leukocyte trafficking and inflammation. A satisfying 
outcome of cell-adhesion research has been the discovery that most 
cell-adhesion events, be they developmental, physiological or patho- 
logical, rely on members of a limited number of families of cell- 
adhesion receptors. This realization has led to a very productive 
synergy among the originally separate areas of investigation. 
Molecular analyses of cell adhesion have revealed that adhesion 
has profound effects on cells that go far beyond merely sticking 
them together. Furthermore, detailed understanding ofcell-adhesion 
receptors has opened the way to manipulating their functions, lead- 
ing to therapeutic strategies applicable to pathological processes 
involving cell adhesion. 


References 

1 Yap, A.S. et at. (1997) Molecular and 
functional analysis of cadherin- based 
adherens junctions. Annu. Rev. Cetl Dev. Biol. 
13, 119-146 

2 Kohmura, N. er al. (1998) Diversity revealed 
by a novel family of cadherins expressed in 
neurons at a synaptic complex. Neuron 20, 
1137-1151 

3 Wu, Q. and Maniau's, T. (1 999) A striking 
organization of a large family of human neural 
cadherin-like cell adhesion genes. Cell 97, 
779-790 

4 Walsh, F.S. and Doherty, P. (1997) Neural cell 
adhesion molecules of the immunoglobulin 
superfamily: role in axon growth and 
guidance. Annu. Rev. Celt Dev. Biol. 13, 
425-456 

5 Lasky, LA. (1995) Select n-carbohydrate 
interactions and the initiation of the 
inflammatory response. Annu. Rev. Biochem. 
64, 113-139 

6 Kansas, G.S. (1996) Selectins and their 
ligands: current concepts and controversies. 
Blood 88, 3259-3287 

7 Hynes, R.O. (1992) Integrins: versatility, 
modulation, and signaling in cell adhesion. 
Celt 69, 11-25 

8 Hemler, M.E. (1999) Integrins. tn Guidebook 
to the Extracellular Matrix and Adhesion 
Proteins (Kreis, T. and Vale, R., eds), 
Sambrook and Tooze Publishers, Oxford 
University Press 

9 Burridge, K. and Chrzanowska-Wodnicka, M. 
(1996) Focal adhesions, contractility, and 
signaling. Annu. Rev. Cell Dev. Biol. 12, 
463-518 


10 Jockusch, B.M. et at. (1995) The molecular 
architecture of focal adhesions. Annu. Rev. 
Celt Dev. Biol. 11, 379-416 

11 Barth, A.I., Nathke, I.S. and Nelson, W.J. (1997) 
Cadherins, catenins and APC protein: interplay 
between cytoskeletat complexes and signaling 
pathways. Curr. Opin. Celt Biol. 9, 683-690 

12 Ben-Ze'ev, A. and Geiger, B. (1998) Differential 
molecular interactions of beta-catenin and 
ptakoglobin in adhesion, signaling and cancer. 
Curr. Opin. Cell Biol. 10, 629-639 

13 Henry, M.D. and Campbell, K.P. (1996) 
Dystroglycan: an extracellular matrix receptor 
linked to the cytoskeleton. Curr. Opin. Cell 
Biol. 8, 625-631 

14 Schwartz, M.A. et at. (1995) Integrins: 
emerging paradigms of signal transduction. 
Annu. Rev. Celt Dev. Biol. 11, 549-599 

15 Clark, E.A. and Brugge, J.S. (1995) Integrins 
and signal transduction pathways: the road 
taken. Science 268, 233-239 

16 Ciancotti, F.G. (1997) Integrin signaling: 
specificity and control of cell survival and cell 
cycle progression. Curr. Opin. Cell Biol. 9, 
691-700 

17 Hemler, M.E. (1998) Integrin associated 
proteins. Curr. Opin. Cell Biol. 10, 578-585 

18 Qian, D. and Weiss, A. (1997) T cell antigen 
receptor signal transduction. Curr. Opin. Cell 
Biot. 9,205-212 

19 Isakov, N. (1997) Immunoreceptor 
tyrosine -based activation motif (ITAM), a 
unique module linking antigen and Fc 
receptors to their signaling cascades. 

J. Leukocyte Biol. 61, 6-16 

20 Pasquale, E.B. (1997) The Eph family of 
receptors. Curr. Opin. Cetl Biol. 9, 608-615 


21 Kimble, J. and Simpson, P. (1997) The 
UN-12/Notch signaling pathway and its 
regulation. Annu. Rev. Celt Dev. Biol. 1 3, 
333-361 

22 Neel, B.G. andTonks, N.K. (1997) Protein 
tyrosine phosphatases in signal transduction. 
Curr. Opin. Cetl Biol. 9, 193-204 

23 Culotti, J.G. and Merz, D.C. (1998) DCC and 
netrins. Curr. Opin. Celt Biol. 10, 609-613 

24 Stark, K.A. et at. (1997) A novel alpha 
integrin subunit associates with betaPS and 
functions in tissue morphogenesis and 
movement during Drosophita development. 
Deve/opment 124, 4583-4594 

25 Arndt, K. etal. (1998) Cadherin-defined 
segments and parasagittal cell ribbons in the 
developing chicken cerebellum. Mot. Celt. 
Neurosci. 10, 211-228 

26 Sperry, R,W. (1963) Chemoaffinity in the 
orderly growth of nerve fiber patterns and 
connections. Proc. Natl. Acad. Sci. U. S. A. 
50, 703-710 

27 Uchida, N. et at. (1996) The catenin /cadherin 
adhesion system is localized in synaptic 
junctions bordering transmitter release zones. 
J. Cetl Biot. 135, 767-779 

28 Fannon, A.M. and Colman, D.R. (1996) 
A model for central synaptic junctional 
complex formation based on the differential 
adhesive specificities of the cadherins. 
Neuron 17, 423-434 

29 Nishimura, S.L. et at. (1998) Synaptic and 
glial localization of the integrin atphavbeta8 in 
mouse and rat brain. Brain Res. 791 , 271-282 

30 Gao, Y. et al. (1998) A critical role for DNA 
end -joining proteins in both lymphogenesis 
and neurogenesis. Cett 95, 891 -902 


Acknowledgements 
I thank 

Denisa Wagner for 
helpful criticism of 
the text, 

Colleen Leslie for 
editorial assistance 
and the NIH and the 
Howard Hughes 
Medical Institute for 
support. 


HiVJ u *-_, r ,_.* rn The Journal of Biological Ckemistby 

7M mirtiUieiD Vol. 269, No. 41, Imu£ of October U, pp. 25235-25238. 1994 

© 1994 by The American Society for Biochemistry and Molecular Biology, Inc. 

Printed uiU.SJl 


Integrin-mediated Cell Adhesion: 
The Extracellular Face* 

Joseph C. Loftus, Jeffrey W. Smith, and 
Mark H. Ginsberg^ 

From the Department of Vascular Biology, The Scripps 
Research Institute, La Jolla, California 92037 

Cell adhesion regulates embryonic development by controlling 
cell migration, growth, and differentiation. Additionally, adhesion 
contributes to the processes of malignant transformation, inflam- 
mation, hemostasia, and immune recognition (1, 2). Cell adhesive 
events are mediated by transmembrane receptors that belong to a 
limited number of supergene families. These include the integrins 
(3, 4), immunoglobulin supergene family (5), cadherins (6), selec- 
tins (7, 8), CD44-related molecules (9, 10), and transmembrane 
proteoglycans (11). The role of each of these supergene families as 
general cell adhesion molecules has only been appreciated in the 
last 10 years, yet cell adhesion has now become an area of intensive 
investigation. This rapid progress precludes a comprehensive re- 
view of this entire field. We will focus on the integrin family, since 
the key insights concerning these receptors are likely to have some 
relevance to other adhesion receptors. 

All integrins discovered to date are heterodimers of a and 0 sub- 
units, which are products of separate genes (2, 4, 12, 13) and are 
mutually interdependent for correct processing and surface expres- 
sion (14). lb date, the integrin family is composed of 14 a subunits 
and 8 0 subunits. Integrins possess a generally conserved structure: 
a large extracellular domain formed by both the — 1000-residue a 
and -750-residue 0 subunits and a transmembrane segment from 
each summit. In general, each subunit possesses a short cytoplasmic 
C-terminal tail (15-17) with the striking exception of the 0 4 subunit 
whose cytoplasmic domain is more than 1000 residues (18-20). 
When the integrin gene family was first proposed in 1987 (21, 22), 
a simple vertical organization consisting of three 0 subunits that 
formed unique heterodimers with distinct a subunits was envis- 
aged. The a subunits, at the time, were thought to pair only with 
specific 0 subunits. With the discovery that the a subunits appear 
to have evolved independently of 0 subunits (23) and the discovery 
of novel 0 subunits (24-26), which form heterodimers with existing 
a subunits, and with the discovery that certain a subunits can form 
heterodimers with multiple 0 subunits (24, 26, 27) more complicated 
schemes have been required to characterize this family (4). lb date, 
all integrin gene disruptions are associated with evident pheno- 
types. These findings unambiguously establish the critical biological 
role of integrins (28-33). In general, integrins perform their func- 
tions by interacting with components of the insoluble extracellular 
matrix or surface proteins on other cells to form links with intra- 
cellular elements involved in bidirectional signaling events. These 
signaling events and the integrin cytoplasmic domains that mediate 
many of them have been extensively reviewed recently (4, 12, 15-17, 
34-37). Thus, we will focus on the recognition of adhesive ligands by 
the extracellular domain of integrins. 

Integrins bind to diverse ligands including components of the 
extracellular matrix (38-40), cell surface Ig superfamily receptors 
(41), components of microorganisms (42-44), and certain plasma 
proteins (45, 46). In most cases, these interactions are divalent 
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cation-dependent, and mapping of integrin recognition sequences 
in ligands almost invariably identifies an acidic residue as a key 
component (47—55). In some protein ligands, these recognition sites 
can be assigned to short linear peptide sequences, e.g. Arg-Gly-Asp 
(48). Even in these cases, additional discontinuous regions of the 
protein may be involved in ligand recognition as demonstrated in 
fibronectin (56-58) (Fig. 1). Indeed, the general theme seems to be 
that integrins will recognize short peptide sequences often pre- 
sented in extended loops containing 0 turns (59-63). Other regions 
of protein ligands may then contribute secondary interactive sites. 

There are also emerging generalities about the sites in integrins 
that recognize ligands. Electron microscopy of rotary shadowed 
preparations of purified integrins suggests that these receptors are 
comprised of a globular -10-nm head (64, 65) to which ligands bind 
(66) and two rodlike tails probably containing C-terminal portions 
of a (66) and 0 (67) subunits and their transmembrane domains 
(68). Biophysical analyses of integrins in detergent solution agree, 
in general, with the idea that these are asymmetric molecules (69, 
70). In addition, Calvete and co-workers (71-76) have examined the 
disulfide bond arrangements and intersubunit contacts in proteo- 
lytic fragments of a prototype integrin, a Ilb 0 3 . Rocco and co-workers 
(77) made an ingenious effort to accommodate these findings to the 
biophysical and electron microscopic data. Integrins are conforma- 
tionally labile (78-81) and are subject to proteolysis (82) and di- 
sulfide bond rearrangement (83). Thus the purification method and 
storage of the receptor (84, 85) should be considered in evaluating 
studies from different groups. 

Potential ligand binding sites in integrins have been investi- 
gated through a combination of immunological, biochemical, and 
mutational approaches. Proteolysis (86) or expression of recombi- 
nant truncated (87) a lIb j3 3 produced ligand binding fragments that 
contain at least the N-terminal half of the a and 0 subunits. These 
results are in good agreement with previous cross-linking studies 
that suggested that ligand recognition sites reside in the N-termi- 
nal portion of both a [lh (88, 89) and 0 3 (88, 90, 91) subunits. These 
results also support the concept that high affinity Ligand recogni- 
tion requires both subunits (84, 92-96) and consequently may in- 
volve multiple ligand contact points. Several such potential contact 
points have now been identified. 

Chemical cross-Unking of an RGD peptide to a [Ib 0 3 followed by 
proteolytic digestion and amino acid sequencing indicated that a 
72-residue sequence in J3 3 (Asp l09 -Glu 171 ) is proximal to bound li- 
gand (91). This localization was supported by identification of the 
overlapping region of 0 3 (Glu^-Glu 220 ) by photoaffinity cross-link- 
ing of an RGD peptide to at^g (90). This region of 0 3 probably is 
directly involved in ligand recognition because 1) point mutations 
in the region abrogate ligand binding function (97, 98), 2) certain 
antibodies directed against this region inhibit ligand binding (99- 
101), and 3) a gain of ligand binding function mutation involves this 
region (102). This region of 0 3 is highly conserved among integrin 
classes, suggesting that this is a common ligand contact site in all 
integrins. That hypothesis has been supported by the loss of ligand 
binding function associated with mutations at residues homologous 
to Asp 119 in 0 t (103) and 0 6 (104). 

The prototype mutation in this ligand contact site, 0 3 (D119Y), 
also alters the conformation of a ]n> /3 3 in a manner consistent with 
loss of bound divalent cation (105). Moreover, Asp 119 resides within 
a region of 0 3 enriched in amino acids with oxygenated side chains 
(Asp 119 , Ser 12 \ Ser 123 , Asp 126 , Asp 127 , and Ser 130 ) whose linear spac- 
ing approximates that of the oxygenated residues in the calcium 
binding loop of EF-hand proteins (106) suggesting that these resi- 
dues may provide coordinating ligands for divalent cations. A syn- 
thetic 0 3 peptide corresponding to residues 118-131 directly binds 
terbium, a luminescent calcium analog. Moreover, substitution of 
Asp 119 by alanine in this peptide reduced the peptide/terbium (107) 
interaction, supporting involvement of this region in both ligand 
recognition and cation binding. Alanine substitutions in 0 3 Asp 119 , 
Ser 121 , or Ser m resulted in deficits in the binding of both macro- 
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. Fig. 1. a carbon backbone representation of a model of the fi- 
bronectin 9th and 10th type £Q repeats. The central cell binding domain 
of fibronectin is comprised of -90-residue repeats designated "type III re- 
peats." Integrin recognizes the 9th (3Fn9) and 10th (3FnlO) type III 
repeats. Depicted is a stereo view of the backbone of a fibronectin 3Fn(9-10) 
structural model generated as described (58). Distinct, linear peptide se- 
quences that regulate fibronectin-integrin interaction have been identified 
within each of these fibronectin type III repeats. The Asp 1373 -Thr 1383 se- 
quence in the 3Fn9 repeat and the Arg W83 -Asp li95 sequence in the 3FnlO 
repeat are shown as dotted lines. The N terminus (AO and the C terminus (C) 
are shown. Note that each recognition sequence resides within an extended 
loop. (Reprinted, with permission, from Itef. 58.) 

molecular and peptide ligands. In contrast, substitutions at posi- 
tions Asp 126 , Asp 127 , or Ser 130 did not affect ligand binding (98). 
These results further implicate this region of 0 3 in ligand binding 
function of a IIb 0 3 and assign functional roles to Asp" 9 , Ser 121 , and 
Ser 123 . This cluster of three oxygenated residues is also highly 
conserved among the integrin 0 subunits (Fig. 2). The divalent 
cation dependence of integrin function and the high degree of con- 
servation of the functionally important oxygenated residues sug- 
gest that ligands interact with divalent cations bound to this highly 
conserved site in the 0 subunit (97). This hypothesis is further 
supported by the presence of critical oxygenated residues in inte- 
grin ligands (47-55) and the evidence that ligand binding to inte- 
grins may displace divalent cations (80, 108). 

A second highly conserved potential ligand interactive site in 0 3 
was identified through a synthetic peptide corresponding to 0 3 
(211-222) that bound to fibrinogen and blocked its binding to a lTt fi 3 
(109). Antibodies directed against this peptide also inhibited the 
binding of adhesive proteins to the purified receptor (109). The 
mechanism of action of the synthetic /3 3 (211-222) has been ques- 
tioned, since the peptide also binds specifically to ot IIb j3 3 (110). Fur- 
thermore, two natural mutations at |3 3 Arg 214 (111, 112) result in 
loss of ligand binding function. Interpretation of that result is 
clouded by the finding that these mutations can also impair the 
stability of the anb/3 3 heterodimer (113). 

a subunit ligand contact sites have also been identified by chem- 
ical cross-linking approaches. Ligand-mimetic peptides cross- 
linked to the N-terminal region of a l1h (114) and a v (89) subunits.. 
This localization was refined to a 21-amino acid stretch of a nb 
defined by Ala^-Met 314 (114), a region spanning the second puta- 
tive divalent cation binding repeat of a IIb . Peptides from this region 
and antibodies against them are reported to inhibit fibrinogen 
binding to a nb 0 3 , supporting a role in ligand recognition (115). In 
addition, a synthetic peptide corresponding to a IIb (300-312) inhib- 
its clot retraction and platelet aggregation and directly binds fi- 
brinogen (116). The immediate proximity of this peptide to the a, Ib 
(296-306) sequence further substantiates the importance of this 
region in the ligand binding function of the receptor. Finally, a 
recombinant fragment of ct m , spanning all four putative cation 
binding sites, has been reported to bind to cations and to fibrinogen 
(117). Mutational analyses of the cation binding sites in the a 
subunits have been hampered by the fact that some of these mu- 
tations block receptor expression (116, 118). Nevertheless, muta- 
tional evidence for a role of the cation binding site in the a 4 subunit 
has appeared (116). The involvement of the cation binding sites in 
a 4 in ligand binding suggests that these regions, like /3 3 (109-171), 
may function in a general mechanism of ligand binding to inte- 
grins. Recent studies with o L lend credence to this idea (119). 

At least six integrin a subunits contain an additional —200- 
residue sequence in their N-terminal third (120, 121). This se- 


lit 

• 


• 


• 










n 

n 
1*3 

D 

L 

s 

Y 

S 

M 

K 

D 

D 

£* 

W 

S 

I 

Q 

N 

ft. 
Pi 

D 

L 

s 

Y 

s 

tt 

K 

D 

D 

L 

E 

N 

V 

K 

S 

P 2 

D 

L 

3 

Y 

S 

M 

L 

O 

D 

L 

R 

N 

V 

K 

S 

P4 

D 

P 

3 

N 

S 

M 

S 

D 

D 

L 

D 

N 

L 

K 

K 


D 

L 

S 

L 

s 

M 

K 

D 

D 

L 

D 

N. 

I 

R 

S 

P« 

D 

L 

3 

A 

s 

M 

D 

D 

D 

L 

N 

T 

I 

K 

E 

P7 

D 

L 

$ 

Y 

s 

M 

K 

D 

D 

L 

E 

R 

V 

R 

Q 

Ps 

O 

V 

B 

A 

s 

M 

H 

N N 

I 

E 

K 

It 

N 

S 


Fig. 2. Alignment of the region of the putative 0, ligand binding 
site with the deduced sequences of the other integrin 0 subunits. 
Alanine substitution of indicated residues in 0 5 (•) results in loss of ligand 
binding function. Boxed are highly conserved amino acids with oxygenated 
side chains with a linear spacing similar to the oxygenated residues in the 
calcium binding loop of EF-nand proteins (106). (Reprinted, with permission, 
from Bajt and Loftus (98).) 

quence appears to have been inserted by exon shuffling (122-125); 
hence it is sometimes referred to as the I (Inserted) domain, and it 
is homologous to the A domains of von Willebrand factor (12&-130). 
There is now growing evidence for the functional importance of this 
domain in ligand binding. Function-altering antibodies map to I 
domains of a M 0 2 (Mac-1, CDllb/CD18) (131, 132) and (VLA-2) 
(133). Moreover, mutations in these domains block ligand binding 
function (132, 133), and an isolated I domain binds ligands (132). 
The I domain of a M also binds cations (132). Moreover, there are 
critical oxygenated residues in the I domain for both cation and 
ligand binding (132, 133). Alignment of a region containing a func- 
tional Asp in the I domains of cr M and a 2 (132, 133) with the pro- 
posed ligand and cation binding site in the 0 3 subunit reveals 
striking similarities (Fig. 3). From this alignment, a conserved 
motif of D(<D) 5 DXSXS<f\ where <P is any hydrophobic residue andX 
is any residue » seems evident. In view of the sequence divergence in 
flanking residues, this similarity probably arose by convergence, 
possibly driven by a common function of divalent cation-dependent 
interactions. Further mutational analysis and evaluation of this 
motif in some of the other proteins in which it is found (e.g. carti- 
lage matrix protein (134), type VI collagen (135), and throm- 
bospondin-related anonymous protein (136)) should test this 
hypothesis. 

As noted, the evidence for the role of the I domain in ligand 
binding is compelling. It is notable that I domain integrins mani- 
fest some differences in divalent cation preferences (137-140). 
Moreover, although some inhibitory peptides have been identified 
for these integrins (141-143) they appear to be of relatively low 
affinity. This should be contrasted with the high affinity small 
ligands that have been found for some other integrins (144-150). 
Thus it is possible that the I domain integrins have a distinct 
binding mechanism from other integrins. This possibility should be 
readily tested by mutational analysis of the highly conserved li- 
gand binding site in 0 subunit partners of I domain -containing a 
subunits. 

All of the ligand-integrin interactions discussed above involve 
divalent cations. Divalent cations could induce conformational 
changes exposing the integrin's ligand binding surfaces in a man- 
ner analogous to troponin C (151), Alternatively, the divalent cation 
may be part of the active site. As noted above, the structures of the 
proposed cation binding motifs in integrins and the sequences of 
integrin ligands suggest that they could simultaneously coordinate 
a single divalent cation. The cation binding motifs in the a subunits 
lack the 12th residue in the EF-hand calcium binding loops (152). 
Thus, Asp or Glu residues within the ligand could provide the final 
acidic residue to the cation coordination loop of the integrins (153, 
154). Recently, quantitative analysis revealed that each o£ IJb 0 3 pos- 
sesses three Mn 2+ binding sites; however, addition of synthetic pep- 
tide ligand resulted in displacement of manganous ions from two of 
these sites (108). This result is consistent with the finding that 
ligand binding to integrins results in conformational changes sim- 
ilar to those seen following chelation of divalent cations (80, 105). It 
is possible that during ligand binding a transient ternary complex 
is formed, followed by additional receptor-ligand contacts and dis- 
placement of the cation from the integrin. Additional support for 
this idea comes from the observation that when Co(II) bound to 0 3 
integrins is oxidized to Co(III), an inert form of the ion that cannot 
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Fig. 3. Conservation of potential integrin ligand binding domains. 

Alignment of relevant integrin I domain and fl subunit sequences demon- 
strates remarkable conservation and suggests a potential new motif involved 
in cation-dependent protein-protein interaction. Residues where mutations 
block ligand binding function are boxed. CON, consensus; 4>, any hydropho- 
bic residue; X t any residue. 

exchange or be displaced, ligand binding is inactivated (155). One 
important prediction is that a divalent cation should be able sup- 
press ligand binding. Such cation-mediated suppression has been 
reported for several integrin-ligand pairs (137-140). 

Integrin extracellular domains have masses in excess of 200 
kDa. Consequently, it is not surprising that they may also undergo 
cation-independent binding interactions. Plasminogen binding to 
a in>03 was the first example of cation-independent integrin ligand 
(157). Moreover, plasminogen also bound to atmJSg (ft, D119Y), a 
mutant that fails to bind to other Uganda (157). Since plasminogen 
activators colocalize with integrins in focal adhesions (158), this 
interaction may facilitate local plasmin formation at these mem- 
brane microdomains. Further, many of the two-chain o subunit 
integrins are posttranslationally cleaved probably resulting in a 
C-terminal lysine on their heavy chain (159, 160), presenting a 
likely site for plasminogen binding. In addition, there is now abun- 
dant evidence that certain integrins concentrate in cell-cell junc- 
tions and may mediate cell-cell interactions (161-163), in some 
cases through homophilic binding (164, 165). Further, there are 
integrin-dependent cell-cell interactions that do not appear to be 
mediated through "classical" integrin-dependent recognition 
mechanisms (156, 166-169). Thus, there will probably be new sur- 
prises and new insights arising from the continuing analysis of 
integrin-ligand interactions. 
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ABSTRACT The existence of integrin-like proteins in 
Candida albicans has been postulated because monoclonal 
antibodies to the leukocyte integrins orM and aX bind to 
blastospores and germ tubes, recognize a candidal surface 
protein of ~ 185 kDa, and inhibit candidal adhesion to human 
epithelium. The gene odNTl was isolated from a library of C. 
albicans genomic DNA by screening with a cDNA probe from 
the transmembrane domain of human aM. The predicted 
polypeptide (alntlp) of 188 kDa contains several motifs 
common to aM and aX: a putative I domain, two EF-hand 
divalent cation-binding sites, a transmembrane domain, and 
a cytoplasmic tail with a single tyrosine residue. An internal 
RGD tripeptide is also present. Binding of anti-peptide anti- 
bodies raised to potential extracellular domains of alntlp 
confirms surface localization in C. albicans blastospores. By 
Southern blotting, aINTI is unique to C. albicans. Expression 
of aINTI under control of a galactose-inducible promoter led 
to the production of germ tubes in haploid Saccharomyces 
cerevisiae and in the corresponding stel2 mutant. Germ tubes 
were not observed in haploid yeast transformed with vector 
alone, in transformants expressing a galactose-inducible gene 
from Chlamydomonas, or in transformants grown in the 
presence of glucose or raffinose. Transformants producing 
alntlp bound an anti-oM monoclonal antibody and exhibited 
enhanced aggregation. Studies of alntlp reveal novel roles for 
primitive integrin-like proteins in adhesion and in STE12- 
independent morphogenesis. 


The opportunistic pathogen Candida albicans is the leading 
cause of invasive fungal disease in neonates, diabetics, and 
immunocompromised patients and carries a high mortality 
despite prompt and appropriate anti-fungal therapy (1-3). 
Three important events in the pathogenesis of invasive can- 
didal infection include adhesion to epithelium, penetration of 
epithelial barriers, and hematogenous dissemination. Compli- 
cating this cascade is the yeast's ability to transform from 
blastospores at the epithelial surface to elongated structures 
(germ tubes, pseudohyphae, and mycelia) that invade under- 
lying tissues. 

Several investigators have reported the existence of surface 
proteins in C. albicans that are antigen ically, structurally, and 
functionally related to the a-subunits of the leukocyte inte- 
grins aM//32 (Mac-1; CDllb/CD18) and aX/02 (pl50,95; 
CDllc/CD18) (4-11). Many monoclonal antibodies (mAbs) 
recognizing epitopes of aM or aX bind to blastospores or germ 
tubes of C. albicans (4-10). iC3b-coated sheep erythrocytes 
rosette with germ tubes of C. albicans (9), and the affinity 
constants for the binding of purified human iC3b to C. albicans 
or to leukocyte aM//32 are virtually identical (5, 12). Envi- 
ronmental conditions such as increased temperature or glu- 
cose concentrations slO mM augment not only the surface 
expression of this integrin-like protein (5, 11) but also the 
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binding of iC3b (12). Recognition of ligands containing the 
tripeptide sequence arginine-glycine-aspartic acid (RGD) fa- 
cilitates the adhesion of C. albicans to endothelial and epithe- 
lial cells (6, 11). 

Among the leukocyte integrins, aM and aX share «*70% 
sequence homology and considerable functional identity. 
These two a-subunits, together with aL, al, and a2, contain 
an inserted or I domain of ca 200 amino acids that is involved 
in ligand binding (13-15). Located just C-terminal to the I 
domain in aM/aX are three divalent cation-binding sites; at 
the C terminus are a membrane-spanning region and a cyto- 
plasmic tail, the latter containing a single tyrosine residue in 
aM and aX (13). 

This manuscript reports the isolation of a C. albicans gene 
encoding a protein that shares these integrin motifs.§ More- 
over, expression of the gene product in haploid Saccharomyces 
cerevisiae is associated with the production of germ tubes 
independently of Stel2p, a yeast transcription factor required 
for morphologic change in response to mating pheromones and 
nutrient limitations in 5. cerevisiae (16). These results open the 
way for the discovery of other integrin-like proteins in prim- 
itive eukaryotes, for their study as precursors of vertebrate 
integrins, and for more detailed investigation of their roles in 
signal transduction and morphogenesis. 

MATERIALS AND METHODS 

Yeast Strains, Plasmids, and Reagents for Cloning. C. 

albicans 10261 (B311, serotype A) was purchased as a lyophi- 
late (American Type Culture Collection), Candida tropicalis 
7555 was isolated from the blood of a fungemic patient by the 
University of Minnesota Clinical Microbiology Laboratory, S. 
cerevisiae YPH500 (MATa ura3-52 tys2-801 ade2-101 trpl-b63 
his3-A200 leu2~Al) is a galactose-utilizing strain obtained from 
the Yeast Genetic Stock Center (Berkeley, CA) (17). pBM272, 
an ARS/CEN-based yeast shuttle vector containing the URA 3 
gene and the S. cerevisiae GAL1-10 promoter (18), pGG201 
containing a 990-bp open reading frame encoding a DNA- 
binding protein from Chlamydomonas reinhardtii (19), a 
750-bp Cla l/Hindlll fragment of the C. albicans actin gene, 
and S. cerevisiae strain M12B-T2 were gifts from James Bodley, 
Judith Berman, Paul Magee, and Beatrice Magee (all of the 
University of Minnesota), respectively. pSUL16, a gift from 
Judith Berman, contains the S. cerevisiae STE12 gene interrupted 
with the yeast LEU2 gene (20). Escherichia coli JM101, LE392, 
XLl-Blue-MRF', and pBluescript II SK(+) were obtained 
from Stratagene. 

Cloning of aINTI. DNA from spheroplasts of C. albicans 
10261 was isolated according to standard procedures (21), 
digested with Sau3Al, and packaged in AEMBL3 (Stratagene). 


Abbreviations: MM, minimal medium; mAb, monoclonal antibody. 
*To whom reprint requests should be addressed at: University of 

Minnesota, Department of Pediatrics, Box 296 UMHC, 420 Delaware 

Street, S.E., Minneapolis, MN 55455. 

§The sequence reported in this paper has been deposited in the 
GenBank data base (accession no. U35070). 
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Preliminary studies confirmed that a 3.5-kbp EcoRl fragment 
of C. albicans DNA hybridized with a 314-bp EcoRl/Sma I 
cDNA fragment derived from the transmembrane domain of 
human aM (kind gift of Dennis Hickstein, Veterans' Admin- 
istration Medical Center, Seattle). A library enriched for 3.0- 
to 3.8-kbp EcoRl fragments was constructed by digestion of 
genomic DNA with EcoRl and ligation to pBluescript II 
SK(+). Primers for amplification of the EcoRl/Sma I aM 
cDNA fragment were as follows: upstream primer, 5'- 
GA ATTCA ATGCTACCCTCA A; downstream primer, 5'- 
CCCGGG GGACCCCCTTCACTT Plasmid minipreparations 
from a totai of 200 colonies were screened by the sib selection 
technique for hybridization at 50°C with 32 P-labeled PCR 
product after confirmation of its nucleotide sequence (13). 
Five clones were isolated from three successive screenings. Two 
of the five clones gave reproducible signals after hybridization 
with a degenerate oligonucleotide encoding a conserved se- 
quence (KVGFFK) in the cytoplasmic domain of oX (22): 
5'-AA(AG) GT(CT) GG(AT) TT(CT) TT(CT) AA(AG)-3'. 
Both clones contained a 3.5-kbp £coRI insert and failed to 
hybridize with a degenerate oligonucleotide from the S. cerevisiae 
gene USOl (23): 5 -GAA AT(ACT) GA(CT) GA(CT) TT(AG) 
ATG-3'. One of these clones (probe 2, Fig. 1A) was chosen for 
further analysis. A 500-bp Hindlll subfragment (probe 3, Fig. I A) 
was used to screen 20,000 clones from a library of C. albicans 


H BgC 


RHX R H 


S R 


3.8 kb 


1 probe 1 

3.S kb 
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0.5 kb 

I I probe 3 
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0.56 — 


Fig. 1. (A) Restriction map of the 10.5-kbp Sal I genomic DNA 
fragment isolated from C. albicans 10261, with the open reading frame 
indicated by the bold arrow. Probe 1, 3.8-kbp Xba I fragment used for 
Southern arid Northern blotting. S, Sal I; H, Hindlll; X, Xba I; Bg, Bgl 
It; C, Cta I; R, EcoRl. (B) Southern blot of genomic DNA from C 
albicans 10261 (lanes 1, 4, and 7), C tropicalis 7555 (lanes 2, 5, and 8), 
arid S. cerevisiae YPH500 (lanes 3, 6, and 9) digested with EcoRl (lanes 
1-3), HindUl (lanes 4-6), and Xba I (lanes 7-9) and hybridized at high 
stringency with [a- 32 P]dGTP-labeled probe 1 (hybridization at 65°C, 
final wash in 0.2x SSC/0.1% SDS at 65°C). The high molecular weight 
band (>12 kbp in lane 7) most likely represents incompletely digested 
DNA. Positions of Mndlll-digested ADNA fragments are indicated on 
the far left. EcoRl and Hindlll digests of four additional S. cerevisiae 
isolates from clinical and laboratory sources, as well as isolates of 
Candida glabrata and Candida parapsilosis, also failed to hybridize with 
probe 1. 


10261 genomic DNA by the plaque hybridization technique (24). 
The largest hybridizing insert, a 10.5-kbp Sal I fragment (Fig. I A ), 
was isolated by agarose gel electrophoresis, cloned, and se- 
quenced. 

Sequence Analysis. Both strands of the 10.5-kbp Sal I 
fragment were sequenced by the method of gene walking on an 
Applied Biosystems model 373 automated sequencer in the 
University of Minnesota Microchemical Facility. Nucleotide 
and protein sequence analyses were performed with the Ge- 
netics Computer Group (University of Wisconsin, Madison) 
Sequence Analysis Software Package, version 7.0 (25). 

Yeast Transformation and Gene Expression. The entire 
open reading frame of aINTI (Bgl ll/Sal I fragment) was 
subcloned into pBM272 after digestion with BamHl and Sal I, 
in order to place the GAL1-10 promoter upstream of the 
cdNTl start codon (pCGOl). S. cerevisiae YPH500 was trans- 
formed with pBM272 or pCGOl by the lithium acetate pro- 
cedure (26). Transformants were selected on agar-based min- 
imal medium (MM = 0.17% yeast nitrogen base/0.5% am- 
monium sulfate) with 2% glucose, in the absence of uracil. 
Induction of aINTI was achieved by growing transformants 
containing pCGOl to mid-exponential phase in noninducing, 
norirepressing medium (MM without uracil with 2% raffinose) 
at 30°C, then harvesting, washing, and resuspehding them in 
inducing medium (MM without uracil with 2% galactose) at 
30°C. YPH500 and YPH500 transformed with vector alone 
(pBM272) were grown under the identical conditions. 

Southern and Northern Blotting. Genomic DNA and total 
RNA were isolated and electrophoresed by standard methods 
(27-30) and transferred to Hybond N+ nylon membranes 
(Amersham) by traditional capillary blotting. 

Flow Cytometry. Anti-peptide antibodies were prepared in 
rabbits (Cocalico Biologicals, Reamstown, PA) to a 23-mer 
peptide encompassing the second divalent cation-binding site 
[amino acid (aa) 596-618] and to a 17-mer peptide spanning 
the RGD site and flanking residues (aa 1142-1158) in cdNTl. 
The IgG fractions of preimmune and immune rabbit sera were 
isolated on protein A-Sepharose (Pharmacia). Fluorescein 
isothiocyanate (FITC)-conjugated goat anti-rabbit IgG 
(Southern Biotechnology Associates) was used as the second- 
ary antibody. For experiments with 5. cerevisiae transformants, 
antibodies included OKM1 (anti-aM IgG2b) or MY9 as 
isotype control (Coulter) and FITC-conjugated goat anti- 
mouse IgM/IgG (Biosource International, Camarillo, CA) (7, 
11). 

Insertional Inactivation of STE12 in 5. cerevisiae. YPH500 
was transformed with pSUL16 by standard techniques (26) and 
chromosomal integrants of the disrupted STE12 gene were 
selected on leucine-deficient MM. After confirmation of ste- 
rility, stel2 mutants were transformed with pCGOl as de- 
scribed above. 

Aggregation Assay. The degree of aggregation of C. albicans 
and S. cerevisiae transformants was determined according to 
published methods (31). 

RESULTS 

Restriction Map and Southern Blotting. The restriction map 
of cdNTl with its 5' and 3' flanking sequences is displayed in 
Fig. 1A Fig. IB shows that a 3.8-kbp Xba I probe from cdNTl 
hybridized with EcoRl, Hindlll, and Xba I fragments from C. 
albicans (lanes 1, 4, and 7) but not from C tropicalis 7555 or 
S. cerevisiae YPH500. Among the yeast strains tested, this 
DNA fragment is unique to C albicans. 

Sequence Analysis of aINTI. Analysis of the nucleotide 
sequence revealed an open reading frame sufficient to encode 
a 1664-residue polypeptide with a theoretical molecular mass 
of i87,989 Da and no extensive homologies with other pro- 
teins. Fig. 2 compares the derived aa sequence of alntlp with 
the characteristic motifs of several integrin a-subunits. bestfit 
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Fig. 2. Schematic diagram comparing the structures of aM, aX, 
and d2 with that of alntlp. Gray regions represent the ligand-binding 
or I domain, the EF-hand divalent cation-binding motifs are indicated 
in black, and the transmembrane regions are hatched. RGD indicates 
the approximate location of this sequence in alntlp (aa 1149-1151), 
The a-subunit schematic is modified from the sequence reported by 
Corbi et al. (13). 

analysis (32) located a putative I domain at aa 230-470, with 
~18% identity to the I domain of human aM. Within this 
I-domain-like region are three potential partial MIDAS motifs 
(DXSX) for the coordination of divalent cations (33). This 
same region (aa 230-470) also displayed 25% identity to the 
nonrepeat region of the fibrinogen-binding protein of Staph- 
ylococcus aureus (34). Chou-Fasman analysis (35) indicated 
multiple a-helices, two of them bracketing the second of two 
possible EF-hand divalent cation-binding motifs (aa 283-295 
and aa 601-613). Fig. 3 shows that the amino acid sequence of 
the second divalent cation-binding site from alntlp differs 
from the EF-hand consensus sequence (36) at only one 
residue, a non-cation coordinating site. A hydrophobic se- 
quence is located at aa 1592-1617 as determined by Kyte- 
Doolittle hydrophobicity plotting (37). Just C-terminal to this 
putative membrane-spanning region in alntlp is a unique 
tyrosine residue, also present in the cytoplasmic tails of aM 
and aX (13, 22). 

In the upstream sequence, a putative TATA box is located 
at —34 from the start codon. The coding sequence also displays 
24 N-glycosylation sites, 6 cysteine residues, and the tripeptide 
sequence argiriine-glycine-aspartic acid (RGD) (aa 1149- 
1151), a feature of many integrin ligands but not of integrins 
themselves. 

Localization of alntlp in C. albicans and 5. cerevisiae. 
Polyclonal antibodies prepared against the second divalent 
cation-binding site and the RGD sequence and flanking 
residues in alntlp recognized 64-82% of C. albicans blasto- 
spores, while preimmune IgG bound to only 0.5-1% of yeast 
cells (P < 0.0001) (Table 1). These results confirm that alntlp 
is a surface protein in C. albicans and that the second 
cation-binding site and the RGD site are in the extracellular 
region of the polypeptide. In 5. cerevisiae, the binding of the 
anti-aM mAb OKM1 was significantly greater in transfor- 
mants expressing aINTI vs. transformants containing vector 
alone for percent yeasts fluorescing (19.0% vs. 6.2%; P ^ 


1 2 3 4 5 6 7 

(D) - X - (DNS) - (ILVFYW) - (DENSTG) - (DNQGHKR) - {GP} - 

6 9 10 11 12 13 

(ILVMO- (DENQTSGCA) - X - X - (DE) - (ILVMFYW) 

" 123456789 10 11 12 13 

alntlp N-N-N-N-S-K-N-V-S-D-M-D-S 

alntlp II D-S-N-D-G-D-R- E-D-N-D-D-I 

Fig. 3. Comparison of divalent cation-binding motifs. (A) Con- 
sensus sequence for the 13-residue EF-hand divalent cation-binding 
motif (36). (B) The N*terminal (I) and more distal cation-binding site 
(II) in alntlp. The standard single letter code for aa residues is used. 
(. . .), Acceptable amino acids; {...}, unacceptable amino acids; X, any 
amino acid. Cation coordinating sites are indicated in boldface type. 


Table 1. Percent yeasts fluorescing and mean channel 
fluorescence of C. albicans blastospores after incubation with 
anti-peptide antibodies 


Antibody 
source 


% yeasts 
fluorescing 


Mean channel 
fluorescence 


Control 12 
UMN12 
Control 13 
UMN13 


1.0 ± 0.5 
82.4 ± 8.6* 
0.40 ± 0.36 
64.1 ± 2.3* 


67.4 ± 24.6 
317.0 ± 24.7* 

36.4 ± 9.2 
266.7 ± 9.2* 


Values represent the mean ± SD of three experiments done on 
different days using different aliqupts of C. albicans 10261. UMN12 is 
the antibody to the second divalent cation-binding motif and UMN13 
is the antibody to the RGD region of alntlp. Control 12 and 13 are 
preimmune IgGs from rabbits prior to immunization with UMN12 and 
UMN13, respectively. A one-tailed Student's t test was used for 
statistical calculations. 
*P < 0.0001 vs. control in all comparisons. 

0.004) and for mean channel fluorescence (181.8 vs. 65.7; P < 
0.013). These results confirm that alntlp is surface-borne in S. 
cerevisiae transformants and is recognized by an anti-integrin 
mAb. 

Expression of aINTI in C. albicans and 5. cerevisiae. Hy- 
bridization of probe 1 with total RNA isolated from C. albicans 
blastospores detected message of ~5.5 kb (Fig. 4/1). In 5. 
cerevisiae, aINTI message was detected in pCGOl transfor- 
mants 6 hr after induction with 2% galactose and continued to 
be expressed for at least 24 hr (Fig. 4B y lanes 1 and 3). As 
expected, message was not detected in pCGOl transformants 
grown under conditions of repression (Fig. 45, lanes 2 and 4) 
or in pBM272 transformants (Fig. 45, lanes 5 and 6). 

Coincident with the detection of aINTI message, the ma- 
jority of the pCGOl transformants formed elongated cell 
projections reminiscent of germ tubes (Fig. 5/4). These struc- 
tures continued to be present for 24 hr and could be detected 
at galactose concentrations > 0.05%. pCGOl transformants 
exhibited polar budding, typical of haploid organisms, rather 
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Fig. 4. (A) Northern blot of C. albicans 10261 total RNA isolated 
from blastospores in mid-exponential growth (arrow) in MM with 2% 
glucose and hybridized with probe 1 (see Fig. I A). (B) Northern blot 
of total RNA from S. cerevisiae transformants: pCGOl transformants 
grown in galactose (lanes 1 and 3), pBM272 transformants grown in 
galactose (lanes 5 and 6), and pCGOl transformants grown to mid- 
exponential phase (lane 2) and to late exponential phase (Lane 4) under 
conditions of repression (2% glucose). Probe 1 was used for hybrid- 
ization. The diffuse signal at 2 kbp in lanes 2 and 4 represents 
nonspecific binding of the probe to the 18S ribosomal RNA band. The 
signal at 1.5 kbp represents actin transcript. 
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Fig. 5. Phase-contrast photomicrographs of S. cerevisiae transfor- 
mants. pCGOl transformants (A) were grown to exponential phase in 
raffinose and then induced with 2% galactose for 6-24 hr. pBM272 
transformants (vector without gene) (B), the parent strain YPH500 
(C), C albicans 10261 (£>), and pGG201 transformants (galactose- 
inducible C reinhardtii gene) (£) were grown identically. All yeast cells 
were photographed with a Leitz Wetzlar Laborlux 12 microscope 
equipped with a WILD MP551 Camera (Heerbrugg, Switzerland). 
(X500.) 

than apical budding, which is typical of diploid organisms. 
pCGOl transformants {MATa) mated to a MATa yeast strain 
were able to form diploid organisms (data not shown). 

pBM272 transformants, YPH500, and C. albicans 10261 did 
not form germ tubes when grown under the identical condi- 
tions (Fig. 5 B-D). pCGOl transformants did not exhibit germ 
tubes when grown in 2% raffinose, 2% glucose, or noninducing 
concentrations of galactose (0.02%) or when cured of the 
plasmid (data not shown). In addition, no germ tubes were 
observed with the galactose-induced expression of an ~300- 
residue DNA-binding protein from C. reinhardtii (Fig. 5£). 
pCGOl transformants exhibited germ tubes during growth in 
liquid and on solid medium (MM with 2% galactose). Germ 
tubes were also observed in yeast strain M12B-T2 transformed 
with pCGOl. Thus, the induction of germ tubes in hapioid 5. 
cerevisiae is specific to expression of aINTI from the plasmid 
pCGOl. 

Ability of Yeast Transformants to Aggregate. The aggrega- 
tion index of pCGOl transformants equaled that of C. albicans 
germ tubes and significantly exceeded the aggregation index of 
C. albicans blastospores and 5. cerevisiae pBM272 transfor- 
mants (Table 2). This finding suggests that S. cerevisiae germ 

Table 2. Percent aggregation of C. albicans and S. 
cerevisiae transformants 

- 

Yeast aggregation* 

C. albicans 
Blastospores 62 ± 1 
Hyphae 89 ± 4t 
S. cerevisiae 
pBM272 65 ± 4 
pCGOl 80 ±2* 

Values represent the mean ± SEM-of four experiments, each done 
in triplicate. G albicans blastospores were grown to mid-exponential 
phase in YPD medium (1% yeast extract/2% peptone/2% glucose) at 
30°C. G albicans hyphae were prepared by growth at 37°C in RPMI 
1640 medium (GIBCO/BRL). S. cerevisiae pBM272 and pCGOl were 
grown in galactose-containing medium (see text). 
*% aggregation = 100 x (OD540 final - OD540 initiaJ)/OD54o final. 

A two-tailed Student's / test was used to determine statistical 

significance. 

*P = 0.0013 vs. G albicans blastospores. 
*P = 0.026 vs. S. cerevisiae pBM272. 
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Fig. 6. (A) Cla I digests of genomic DNA from wild-type YPH500 
(lane 1) and stel2 mutant (lane 2). The blot was probed at high 
stringency with a 600-bp Sph l/Cla I fragment from pSUL16. (B) 
Phase-contrast photomicrograph of YPH500 stel2 mutants trans- 
formed with pCGOl and grown in galactose for induction of cdNTL 


tubes synthesizing alntlp are functionally similar to germ 
tubes in G albicans. 

Induction of Germ Tubes in stel2 Mutants of 5. cerevisiae, 
Insertional inactivation of STEl 2 in YPH500 shifted the Cla I 
digestion fragment from 5.2 ± 0.1 kbp in the parent to 4.1 ± 
0.1 kbp in the stel2 mutant (Fig. 6u4)..The EcoRl fragment 
shifted from 10.5 ± 0.7 kbp (parent) to 5.0 ± 0.2 kbp (mutant). 
stel2 mutants were unable to mate. After transformation of 
stel2 mutants with pCGOl and induction of odNTl expression 
by growth in galactose, the mutants formed germ tubes (Fig. 
65). Therefore, the observed morphological change is inde- 
pendent of STE12. 

DISCUSSION 

We have isolated a gene encoding a putative integrin-like 
protein in C. albicans by screening a genomic library with 
conserved sequences from the transmembrane and cytoplas- 
mic domains of human aM. alntlp exhibits several motifs 
common to a-integrin subunits, including two EF-hand motifs 
and three partial MIDAS motifs within a putative I domain, a 
membrane-spanning domain, and a cytoplasmic tail with a 
conserved tyrosine residue at the C terminus. Because aM and 
aX recognize iC3b and fibrinogen as ligands (13), a 25% 
identity with the fibrinogen-binding protein of S. aureus (34) 
provides additional evidence for relationship. 

Divalent cation-binding sites in the amino acid sequence of 
aM provided initial evidence of the leukocyte integrins' 
relationship to other vertebrate integrins (13). Both cation- 
binding motifs in alntlp conform to the classic EF-hand 
consensus sequence. In comparison, two of the three cation- 
binding sites in aM agree at 11 of 13 residues; one of these and 
the third site require a gap to improve the alignment (13). 
Chou-Fasman analysis indicates that both divalent cation- 
binding sites in alntlp, but not aM, are bracketed by a-helices, 
a conformation that facilitates cation binding (38). In addition, 
alntlp contains three partial MIDAS motifs (DXSX) within 
the putative I domain. A full or partial MIDAS motif is present 
in all members of the I domain super family (15, 33). Of note, 
an I-domain-like sequence in S. cerevisiae Usolp binds iC3b 
and the anti-aM mAb Mn41 (39) but has no divalent cation- 
binding sites or MIDAS motifs. 

The presence of an I domain and an RGD sequence in the 
extracellular region of alntlp should contribute to the adhe- 
sive capabilities of this protein. For example, an extracellular 
RGD sequence in the filamentous hemagglutinin oiBordetella 
pertussis facilitates adhesion of the bacterium to eukaryotic 
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cells (40). Another putative candidal adhesin is encoded by a 
3.3-kbp genomic DNA fragment and enables transformed S. 
cerevisiae to adhere to polystyrene or buccal epithelial cells 
(41). However, its restriction map differs markedly from that 
of aINTI, and the nucleotide sequence has not been published. 

In addition to a role as an adhesin, alntlp leads to the 
production of germ tubes in haploid S. cerevisiae in a process 
independent of STE12. Although the morphological change 
correlates with expression of the candidal gene product and 
not with the production of other foreign proteins, we cannot 
discount the possibilities that alntlp unnaturally disrupts the 
cytoskeletal architecture or the growth cycle or that other 
recognized morphogenic cascades, such as those involving the 
CDC genes (42, 43), may be implicated. 

To date, only two genes that participate in morphogenesis in 
C. albicans have been reported. ACPR, also called CPH1, 
encodes a protein of 699 aa that is 74% identical to 5. cerevisiae 
Stel2p (44, 45). STE12 is an essential gene in at least two 
pathways involved in morphogenesis in S. cerevisiae: the in- 
duction of pseudohyphae in diploid cells on nitrogen-limited 
medium (46) and the invasive response of haploid cells on rich 
solid medium (47). Thus, the induction of germ tubes in S, 
cerevisiae transformants expressing aINTI after insertional 
inactivation of STE12 suggests a novel pathway for integrin- 
mediated signaling. The second gene, PHR1, encodes an 
^580-aa polypeptide essential for pH-dependent morphogen- 
esis in C. albicans (48). ACPR and PHR1 encode intracellular 
regulatory proteins. The isolation of a gene encoding a sur- 
face-borne, integrin-like protein in C. albicans and its ability to 
induce morphological variants in haploid S. cerevisiae empha- 
size potential roles for oJNTl in pathogenesis, signal trans- 
duction, and differentiation in C albicans and 5. cerevisiae. 

C.G. is a St. Jude's Children's Research Hospital Fellow sponsored 
by the Pediatric Scientist Development Program. This research was 
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(AI25827 and HD7031), the Pediatric AIDS Foundation, and the 
American Legion Heart Research Foundation to M.H. 
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Herein we describe the cDNA sequence of a novel 
human gene, ITGBL1, encoding a 0 integrin-related 
protein termed TIED [for ten p integrin epidermal 
growth factor (EGF)-like repeat domains]. Overlap- 
ping cDNA clones from fetal lung, HUVEC, and osteo- 
blast cDNA libraries encode a sequence comprising a 
typical signal peptide, followed by a hydrophilic 471- 
amino acid domain containing 10 tandem EGF-like re- 
peats strikingly similar to those found in the cysteine- 
rich "stalk-like" structure of integrin ft subunits. The 
EGF-like repeats of TIED and 0 integrins are unique in 
that they alternate in homology and possess two addi- 
tional cysteines (eight in total) whose positions differ 
from those in the other eight-cysteine EGF-like do- 
mains of laminin, fibrillin, and the latent TGF-/5 bind- 
ing proteins. TIED mRNA transcripts of 2.8 kb were 
detected in aorta, thymus, and osteogenic sarcoma 
cells. The ITGBL1 gene was mapped to human chro- 
mosome 13, band 13q33. We suggest that ITGBL1 may 
be linked in some way with the evolution of the inte- 
grin P SUbunitS. © 1999 Academic Press 


INTRODUCTION 

Integrins are a superfamily of dimeric a/3 cell-surface 
glycoproteins that mediate the adhesive functions of 
many cell types, enabling cells to interact with one 
another and with the extracellular matrix (ECM) (re- 
viewed by Hynes, 1992). Electron microscopy reveals 
that integrins have a globular ligand-binding head 
composed of parts of both subunits and two stalks that 
extend to the plasma membrane (Carrell et al, 1985; 

Sequence data from this article have been deposited with the 
EMBL/GenBank Data Libraries under Accession No. AF072752. 

1 To whom correspondence should be addressed. Telephone: (64) 
9 373-7599, Ext 6280. Fax: (64) 9 373 7492, E-mail: gw.krissansen@ 
auckland.ac.nz. 


Nermut et al, 1988). All eight identified integrin /3 
subunits are highly similar (31-46% amino acid iden- 
tity), where the stalk region is composed of a fourfold 
repeat of a cysteine-rich segment that is thought to be 
internally disulfide-bonded. No function has been as- 
cribed to the stalk region, apart from the fact that it 
probably facilitates ligand binding by ensuring that the 
globular head extends beyond the glycocalyx. The stalk 
region appears to be a conduit for signaling events that 
either lead to integrin activation or are induced in 
response to ligand binding. Thus the AG89 mAb pref- 
erentially recognizes the cysteine repeat region follow- 
ing integrin activation and can itself induce activation 
of j81-integrin (Takagi et al, 1997). 

A previous comparison had revealed that the inte- 
grin j3 subunit cysteine-rich repeats were homologous 
with a cysteine-rich repeat region in domain III of 
laminin B chains (Yuan etal, 1990). The four cysteine- 
rich repeats in j3 integrin subunits were most related to 
the first four repeats in domain III (20-40%). Part of 
the repeat unit of the laminin Bl chain was shown to 
contain a sequence similar to an EGF domain; how- 
ever, the cysteine repeats in laminin are larger than 
those of EGF and contain eight rather than six cysteine 
residues (Pikkarainen et al, 1988). Pairwise sequence 
identity comparisons between EGF modules of differ- 
ent proteins suggest that the laminin EGF repeats, 
and hence also the integrin repeats, are "outliers" and 
should be described as EGF-like until 3D structural 
comparisons can confirm their family membership 
(Campbell and Bork, 1993). 

EGF-like domains contained in many growth factors, 
receptors, adhesion molecules, and proteins of the co- 
agulation and fibrinolytic pathway have either been 
shown or are expected to participate in protein-protein 
or protein-cell interactions (Campbell and Bork, 1993; 
Appella et al, 1988; Engels 1989). Interestingly, EGF 
domains in several proteins, including the integrin j35 
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(Ramaswamy and Hemler, 1990) and j36 (Sheppard et 
al, 1990) subunits, the laminin- associated protein ni- 
dogen (Timpl et al, 1990), the glycoproteins PAS-6/7 
(Andersen et al, 1997) and lactadherin (Taylor et al, 
1997), and entactin (Dong et al, 1995), contain the 
small tripeptide RGD, which is a major integrin bind- 
ing site. Thus PAS-6/7 and lactadherin bind the inte- 
grin av]85 in an RGD-dependent fashion, and the RGD 
motifs in entactin bind av/33 and possibly a3j31. These 
EGF domains may participate in integrin-mediated 
RGD-dependent cell adhesion events. The site in lami- 
nin that mediates cell attachment, migration, and re- 
ceptor binding was localized to the peptide CDPGY- 
IGSR in the EGF-like repeat domain III of the Bl chain 
(Graf etal, 1987). EGF domains in some ECM proteins 
are mitogenic as exemplified by those in the inner short 
arm structures of laminin (Panayotou et al, 1989). 

Here we report the cDNA sequence of a new member 
of the EGF-like protein family, termed TIED, that has 
the potential to provide novel insights into the evolu- 
tion and alternative functions of the stalk structure of 
integrin )3 subunits. 

MATERIALS AND METHODS 

Cell culture. The human osteogenic sarcoma U-20S cell line 
(Ponten and Saksela, 1967) obtained from the American Type Cul- 
ture Collection (ATCC) was cultured in McCoy's 5A medium supple- 
mented with 10% FBS, 2 mM glutamine, 50 ng/m\ penicillin, and 50 
/xg/ml streptomycin, at 37°C in a 5% CO z atmosphere. 

Screening of cDNA libraries. The TIED cDNA was initially iden- 
tified as an expressed sequence tag (EST) following screens for inte- 
grin homology in an EST cDNA database using the BLAST network 
service provided by the National Center for Biotechnology Informa- 
tion. Partial-length TIED cDNA clones HSRAZ62 and HLHFV34 
were identified in databases from human osteoclastoma and fetal 
lung cDNA libraries, respectively. Further clones were identified by 
screening fetal lung and umbilical vein endothelial cDNA libraries 
constructed using the LambdaZAP II vector (Stratagene, La Jolla, 
CA). Libraries were replica plated onto Gene Screen Plus filters 
(DuPont, Boston, MA), and screened as described previously (Yuan et 
al., 1992) using either a 900-bp EcdRVEcdRi fragment from clone 
HLHFV34 or a 223-bp PCR product, encompassing nucleotides 1216 
to 1438, generated by PCR with the primers 62F 5'-ATGACGGAA- 
GAACAAAGCAAGAA-3' and 62R 5'-ATCCATCCCAGCAATCA- 
CAGTT-3' from clone HSRAZ62. 

DNA sequencing. DNA sequences were determined by cycle se- 
quencing using an Applied Biosystems 373A automated DNA se- 
quenator (The Centre for Gene Technology. School of Biological Sci- 
ences, University of Auckland, Auckland, New Zealand). The 
composite TIED sequence was obtained on both strands of the over- 
lapping cDNA clones HSRAZ62, HLHFV34, S0003.9, HOHCH55, 
and HUVEC5.1.1, using a combination of Universal M13 and se- 
quence-specific primers. Sequence analysis was performed using the 
Wisconsin package version 9.1 from the Genetics Computer Group 
(GCG) (Madison. WI). 

Polymerase chain reaction (PCR). The expression of ITGBLl was 
analyzed by PCR using DNA templates from a human thymus cDNA 
library (ATCC) and cDNA prepared from mRNA extracted from 
U-20S cells. Thermocycling parameters were 94°C for 1 min; 30 
cycles of 94°C for 30 s, 58°C for 30 s, and 72°C for 30 s; followed by 
a final extension at 72°C for 3 min. For the chromosomal assignment, 
PCR was initially carried out with 24 cell hybrid DNAs in which 14 
of the hybrids contained a single chromosome and the remaining 10 


contained two to three chromosomes or one to three chromosomal 
fragments (Kelsell et al., 1995). Subsequently a set of 7 cell hybrid 
DNAs was employed, in which cell hybrids MOG34A4, DUR4.3. 
SIR74ii, and LSR34S49 contain chromosome 13, and hybrids 
TWIN19-D12, CTP34B4, and DTI. 2.4 together contain all other hu- 
man chromosomes except for chromosomes 13, 9, and 19. Two prim- 
ers, GM-F (5 ' -CG AATG AA ATCCG AGTACCTATT AG- 3 ') and GM-R 
(5'-GCATCCCTGGCCTCTACCCAC-3'), were designed to amplify a 
region encompassing nucleotides 1618 to 1839 of the TIED cDNA 
sequence. They amplified a PCR product of 222 bp from human DNA. 
but not from mouse or hamster DNA. The PCR conditions for am- 
plification from cell hybrid DNAs were as above except that anneal- 
ing was carried out at 62X and extension lasted for 45 s. PCR 
products were resolved on 2% agarose gels, stained with ethidium 
bromide, and transferred to GeneScreen Plus. Blots were hybridized 
with a 32 P-labeled 2.5 kb NotVINotl fragment of clone HOHCH55 in 
5X SSC, 5X Denhardt's solution, 50% formamide, with 1% SDS and 
100 n-g/ml denatured salmon sperm DNA, at 42°C. They were 
washed twice in 0.1 x SSC t 0.1% SDS at 60°C for 30 min and auto- 
radiographed. 

Fluorescence in situ hybridization (FISH). Metaphase spreads 
were prepared from phytohemagluttinin-stimulated peripheral 
blood lymphocytes of a 46, XY male donor using standard cytogenetic 
procedures. The 2.5-kb insert of clone HOHCH55 was labeled with 
biotin-16-dUTP using a Biotin High Prime labelling kit (Boehringer 
Mannheim). Conditions for hybridization and immunofluorescent 
detection were essentially as described (Morris et al. 1993), except 
that Cot 1 suppression was not required, slides were washed with 
0. 1 X SSC at 60°C, and an additional amplification step was included. 
For precise chromosome band localization, DAPI and FITC images 
were captured separately for each metaphase from the fluorescence 
microscope using a Photometries KAF1400 CCD camera and QUIPS 
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FIG. 1. Alignment of TIED cDNA clones, and their nucleotide 
and deduced amino acid sequences. (A) Partial restriction map of the 
composite TIED cDNA sequence and alignment of cDNA clones. The 
schematic at the top shows the open reading frame (ORF) as an open 
box, and 5'- and 3 '-untranslated regions are shown as solid lines. The 
positions of recognition sites for the restriction enzymes Pvull (P), 
Hindll! (H), Smal (S), and EcoRl (E) are indicated by vertical lines. 
The relative positioning of the TIED cDNA clones, which were se- 
quenced on both strands, is indicated at the bottom with solid lines. 
The dashed region in HOHCH55 was sequenced on one strand only. 
The dotted 3 '-untranslated regions in HLHFV34 and HUVEC5.1.1 
denote sequence variation with corresponding regions in HSRAZ62, 
S0003.9, and HOHCH55. The scale bar indicates 200 bp. (B) Nucle- 
otide and deduced amino acid sequences of the TIED molecule. The 
numbers in the left margin refer to nucleotide and amino acid posi- 
tions. The first nucleotide of the start codon and the initiator methi- 
onine have each been assigned position 1. The 10 cysteine-rich inte- 
grin p- and EGF-like repeats are indicated with solid lines below the 
aa sequence, and the Cys residues in each repeat are numbered after 
Yuan et al (1990). The stop codon is represented by an asterisk, the 
putative signal peptide and polyadenylation signal site (AATAAA) 
are underlined, and a potential site for N-linked glycosylation at 
position 405 is indicated (®). 


GENE FOR THE 0 INTEGRIN-RELATED PROTEIN TIED MAPS TO 13q33 


171 


B 

-219 ACCAGCACCCCGCCCAGAGCAGTGCCGCTGCCCAAATCC 
-180 TCGCAGGCAGCTCATCAACGCAATTGCAACTCCGGCTGGAGCCCCGGACCTGCAAGCCTGGGTGTCCGTGGGTCCGTCTGCCCAGCCATC 
-90 TGCTGGTGGCACCTCTCCCTCCTGCCGCCTCCCTCGGTGAACCCCACCTTGCAGAAGTGCAGCTCGCCCGGAGCAGCCCAGGAGCTCAGC 
1 ATGCGTCCCCCAGGCTTCAGGAACTTCTTGCTGCTGGCGTCCTCCCTTCTCTTTGCTGGGTTGTCAGCTGTTCCTCAAAGCTTCTCGCCA 
1 MRPPGFRNFLLLASSLLFAGLSA V P Q S F S P 

91 TCTCTGAGGAGCTGGCCGGGCGCCGCCTGCAGGCTGTCCCGGGCCGAGTCCGAGCGACGCTGCCGCGCACCTGGGCAGCCCCCGGGGGCC 
31 SLRSWPGAACRLSRAESERRCRAPGQPPGA 

1 

181 GCGCTGTGCCACGGCCGGGGCCGCTGCGACTGCGGCGTCTGCATCTGCCACGTGACTGAGCCGGGCATGTTCTTCGGGCCCCTGTGTGAG 
61 ALCHGRGRCDCGVCICHVTEPGMFFGPLCE 

2 3 4 5 6 7 

271 TGCC ATGAGTGGGTGTGCGAGACCT ACGACGGGAGCACCTGTGC AGGCC ATGGT AAGTGTG ACTGTGGC AAGTGC AAGTGTG ACC AGGG A 
91 CHEWVCETYDGSTCAGHGKCDCG KCKCDQG 

8 1 2 3 4 5 6 

361 TGGTATGGGGATGCTTGCC AGTACCC AACTAACTGTGACTTGAC AAAGAAGAAAAGTAACC AAATGTGC AAGAATTCACAAGAC ATC ATC 
121 WYGDACQYPTNCDLTKKKSNQMC KNSQDI I 

7 8 1 

451 TGCTCTAATGCAGGT AC ATGTC ACTGTGGC AG GT GT AAGTGTG AT AATTC AG ATGGAAGTGGACTTGTGT ATGGT AAATTTTGTGAGTGT 
151 CSNAGTCHCGRCKCDNSDGSGLVYGKFCEC 

2 3 4 5 6 7 8 

541 GACGATAGAGAATGCATAGACGATGAAACAGAAGAAATATGTGGAGGCCATGGGAAGTGTTACTGTGGAAACTGCTACTGCAAGGCTGGT 
181 DDREC I DDETEEICG. GHGKCYCGNCYCKAG 

1 2 3 4 5 6 

631 TGGCATGGAGATAAATGTGAATTCCAGTGCGATATCACCCCCTGGGAAAGCAAGCGAAGATGCACGTCTCCAGATGGCAAAATCTGCAGT 
211 WHGDKCEFQCDITPWESKRRCTS PDGKICS 

7 8 1 2 

721 AACAGAGGGACTTGTGTATGTGGTGAATGTACCTGTCACGATGTTGATCCGACTGGGGACTGGGGAGATATTCATGGGGACACCTGTGAA 
241 NRGTCVCGECTCHDVDPTGDWGDI HGDTCE 

3 4 5 g 7 

811 TGTGATGAGAGGGACTGTAGAGCTGTCTATGACCGATATTCTGATGACTTCTGTTCAGGTCATGGACAGTGT AATTGCGGAAGATGTGAC 
271 CDERDCRAVYDRYSDDFCSGHGQCNCGRCD 

8 1 2 3 4 5 

901 TGCAAAGCAGGCTGGTATGGGAAGAAGTGTGAGCACCCACAGTCCTGCACGCTGTCAGCTGAGGAGAGCATCAGGAAGTGCCAGGGAAGC 
301 CKAGWYGKKCEHPQSCTLSAEES IRKCQGS 

6 7 8 1 

991 TCGGATCTGCCTTGCTCTGGGAGGGGTAAATGTGAATGTGGCAAATGCACCTGCTATCCTCCAGGAGATCGCCGGGTGTATGGCAAGACT 
331 SDLPCSGRGKCECGKCTCYPPGDRRVYGKT 

■ 2 3 4 5 6 

1081 TGTGAGTGTGATGATCGCCGCTGTGAAGACCTCGATGGTGTGGTCTGTGGAGGCCACGGCACATGTTCCTGTGGTCGCTGTGTTTGTGAG 
361 CECDDRRCEDLDGVVCGGHGTCS CGRCVCE 

7 8 1 2 3 4 4 6 

1171 AGAGGATGGTTTGGAAAGCTCTGCCAACATCCGCGGAAGTGTAACATGACGGAAGAACAAAGCAAGAATCTGTGTGAATCAGCAGATGGC 
391 RGWFGKLCQHPRKCNMTEEQSKNLCESADG 

7 8 ® 1 

1261 ATATTGTGCTCGGGGAAGGGTTCTTGTCATTGTGGGAAGTGCATTTGTTCTGCTGAAGAGTGGTATATTTCTGGGGAGTTCTGTGACTGT 
421 ILCSGKGSCHCGKCICSAEEWYI SGEFCDC 

2 3 4 5 6 7 8 

1351 GATGACAGAGACTGCGACAAACATGATGGTCTCATTTGTACAGGGAATGGAATATGTAGCTGTGGAAACTGTGAATGCTGGG ATGGATGG 
451 DDRDCDKHDGLICTGNGICSCGNCECWDGW 

1 2 3 4 5 6 

1441 AATGGAAATGCATGTGAAATCTGGCTTGGCTCAGAATATCCTTAACAATTACATGAGAGAGGTCTGGATTCTTATTTTTTCTGGGCCATT 
481 NGNACE1WLGSEYP* 

7 

1531 AGAACATATAAATGCGAAGGAAACCATGTATATTCACCACTAGGACAGGTTAAAAAGACCATTGTATGTTTTTCTATTTCTGAATTACGA 
1621 ATGAAATCCGAGTACCTATTAGAAATGAGTTATGCAAATTTAGATGCAAATAACATTAGAAAAAAAAGATTCTTCCATAATTAACATAAG 
1711 TGGTTCCTAACGAGAGCAATTTTTCCACCCAAAAGTCATTTGGCAACATCTACAGACAATTTTGATTGTCACACTGGGTCGGGTAGGAAG 
1801 GT ATGCTGC AG AC ATTTGGTGGGTAGAGGCC AGGGATGCTGCTGAGC ATCCCGC AGTGT AC AGGAC AGCCCCCAAACAAGG AATT ATCC A 
1891 GCCCC AAATGCCAATAGGGCTC AAACTGAGAAAC ATTGAGTTATATGGCTATTAG AAATCC AC ATTCTT ACACAAGAAAGACC AT ATTAG 
1981 AATCTAAGGAAAACATGCATATTCACATTAATTAATCGATCAGATTTTTCCAGAATTCCGTATCAGTCACCATTTTAATATGGGGACAAT 
2071 G AAG AC AAGC AC AC AGG AGGT AG AAT AT C AG AGTGGGGCTGG ATC AAGGGC AAAAAC T GG T C AT T AAGTC ATCTG AC AT T AAAT C ATTT A 
2161 GCCACTAAGTTATTTGTGTACTCTCACTTTAAACTCACCAAAGAAGATTCTCTTAAAGAAATTATGAAAAATGTACAATTTAACATTTTA 
2251 AATAAATAGTGACAGAAGTTGTTTAAAAA 

FIG. 1— Continued 


(Vysis Inc., Downers Grove, IL) Smartcapture FISH software (ver- 
sion 1.3). QUIPS CGH/Karyotyping software (version 3.0.2) assisted 
in karyotype analysis. 

Northern blot analysis. Human MTN I and II filters and a Hu- 
man RNA Master Blot (C Ion tech) were screened with the 32 P-labeled 
900-bp EcoRVEcoRl fragment of the insert of clone HLHFV34. Hy- 
bridization was carried out at 60°C in ExpressHyb solution (Clon- 
tech). Filters were washed twice in 0.1 x SSC, 1% SDS at 50°C for 30 
min and autoradiographed. Total RNA was isolated from the osteo- 
genic sarcoma cell line U-20S as described (Chomczynski and Sac- 
chi, 1987), separated on 1% agarose formaldehyde gels, and trans- 
ferred to GeneScreen Plus. Blots were hybridized with the 32 P- 
Iabeled 2.5-kb insert of clone HOHCH55 in 5x SSC, 5x Denhardt's 
solution, 50% formamide, with 1% SDS and 100 /xg/ml denatured 


salmon sperm DNA, at 42°C. They were washed twice in O.lx SSC, 
0.1% SDS at 50°C for 30 min and autoradiographed. 

RESULTS AND DISCUSSION 

Cloning of a Novel Integrin /3 Subunit-Related 
Molecule 

A homology search (Altschul etal. 1990) of a human 
EST cDNA database generated through the combined 
efforts of Human Genome Sciences, Inc. and The Insti- 
tute for Genomic Research (Adams et ah, 1995; Feng et 
al t 1996), using the known amino acid sequences of 
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integrin subunits, identified clones HSRAZ62 and HL- 
HFV34 from osteoclastoma and fetal lung cDNA librar- 
ies, respectively, which represented a potential novel 
integrin 0 subunit. The HSRAZ62 clone was sequenced 
on both strands, and alignment of the translated se- 
quence with integrin j3 subunit sequences revealed 
that it encoded two complete cysteine repeat domains 
highly similar to those contained in the J3 integrin 
stalk-like structure. However, no N-terminal methio- 
nine initiation codon was present, and the last cysteine 
repeat was not followed by a transmembrane domain, 
as in integrin /3 subunits. To isolate the full-length 
sequence for this unusual clone, a 223-bp HSRAZ62- 
derived PCR product (refer to Materials and Methods) 
was used to screen a variety of cDNA libraries includ- 
ing two prepared from human fetal lung and umbilical 
vein endothelial cells, from which positive clones were 
obtained. Clone S0003.9 from the fetal lung library and 
clone HUVEC5.L1 from the endothelial cell library 
both extended the HSRAZ62 sequence, and a subse- 
quent screen of the EST database identified the poten- 
tial full-length cDNA clone HOHCH55 from an osteo- 
blast cell cDNA library (Fig.lA). 

Structure of the TIED ("Ten J3 Integrin EGF-like 
Repeat Domains") Molecule 

The nucleotide and deduced amino acid sequence of 
the complete TIED molecule derived from the compos- 
ite cDNA is shown in Fig. IB. The 2493-nucleotide 
sequence includes 219 nucleotides of 5 '-untranslated 
sequence, a 1 485-nucleotide open reading frame encod- 
ing 494 amino acid residues, and 789 nucleotides of 
3 '-untranslated sequence that includes a consensus 
A AT AAA poly (A) signal followed 18 nucleotides later 
by a poly(A) stretch. The presumptive methionine 
translational start codon is flanked by sequence that 
resembles but is not identical to a classical Kozak 
consensus, PurNNAUGPur. Nevertheless it is followed 
by a hydrophobic stretch of 23 amino acid residues that 
is typical of a signal peptide sequence (Fig. 2A). A 
recently submitted EST from the Washington Univer- 
sity-NCI Human EST Project (Accession No. 
AA4 17383) extends the HOHCH55 sequence by 59 nu- 
cleotides and incorporates an in-frame stop codon, ren- 
dering it unlikely that the open reading frame extends 
upstream of the designated start codon. The putative 
signal peptide is followed by a predominantly hydro- 
philic domain of 471 amino acid residues, containing 10 
EGF-like cysteine-rich repeats. The last repeat is in- 
complete, missing the C-terminal cysteine. The pre- 
dicted molecular mass of an unglycosylated form of the 
mature protein is 51.4 kDa; however, there is one po- 
tential N-linked glycosylation site, Asn 405. 

As this work was nearing completion, a BLAST 
search of the GenBank database revealed an entry, 
AB008375, whose sequence was essentially identical to 
that of TIED, except that it contains an extra G residue 
at nucleotide position 337, which alters the reading 
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FIG. 2. Hydropathy and internal similarity plots of the TIED 
molecule. (A) Hydropathy plot (Kyte and Doolittle, 1982) of the 
deduced TIED protein, illustrating the hydrophobicity of the first 23 
amino acid residues predicted to represent a functional signal pep- 
tide. (B) Dot-matrix comparison illustrating the repetitive nature of 
the deduced amino acid sequence of TIED. In this GCG plot, amino 
acid residues in the TIED sequence are compared with one another 
in pairwise fashion. Similarities are converted to dots that form 
clusters and diagonal lines, with complete identity along the central 
diagonal. 

frame. Thus the encoded molecule is N-terminally 
truncated, being identical to TIED C-terminal to amino 
acid residue Gly 113, but extending only 21 residues 
further N-terminal. In addition to several nucleotide 
substitutions, AB008375 harbors a 68-bp deletion (nu- 
cleotides 145 to 212). The encoded molecule was pro- 
posed to be osteoblast-specific, but this seems unlikely 
given our expression data for TIED. 

The TIED EGF-like Domains Are Remarkably 
Similar to Those of Integrin j3 Subunits 

The repetitious modular structure of TIED is most 
clearly illustrated in Fig. 2B by a dot-matrix compari- 
son, where the presence of repeats is visualized by lines 
and dashes that run parallel to the central diagonal 
that marks amino acid identity. Comparison of the 
deduced TIED sequence with EGF-like proteins in the 
GenBank database revealed that the TIED repeats 
were most similar with the j3 integrin cysteine-rich 
repeats. There are two features that distinguish the 
integrin j3 subunit and TIED repeats from the majority 
of other EGF-like proteins. The EGF domains of TIED 
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FIG. 3. The EGF-like domains of TIED are highly similar to those of 0 integrins. (A) The EGF-like domains of TIED alternate in 
homology. Amino acid sequences of the odd (upper) and even (lower) numbered EGF-like repeats in TIED are aligned, with consensus 
sequences shown below each group. Consensus amino acid residues are boxed in black. Numeration of amino acid residues is as in Fig. IB. 
(B) The EGF-like domains of the /3 integrins also alternate in homology. Consensus sequences of the four EGF-like repeats of the human 
integrin j31 through 08 subunits have been aligned. Identical amino acid residues are boxed in black. (C) Similarity between the TIED and 
the p integrin EGF-like domains extends to large stretches of sequence identity. The first, seventh, and eighth EGF-like repeats of TIED have 
been aligned and compared with the second repeat of integrin /37 (Int7.2), the fourth repeat of integrin j33 (Int3.4), and the third repeat of 
integrin /36 (Int6.3), respectively. Identical amino acid residues are boxed in black. The percentage identity is indicated in the right margin. 
Gaps introduced to optimize the alignments are denoted by tildes ("). In the consensus sequences, uppercase letters indicate that 75 to 100% 
of the sequences aligned contain the amino acid at that position; lowercase letters indicate 50 to 75%; plus and minus signs indicate basic 
and acidic residues, respectively; and each dot represents the position of an amino acid residue that is not conserved. Conserved Cys residues 
are numbered as in Fig. IB or after Yuan et al. (1990). 


and /3 integrins contain eight cysteines, rather than the 
six cysteines found in the "classical" EGF domain (Figs. 
3A and 3B). Only the first integrin repeat and the last 
TIED repeat are exceptions to the rule, possessing 
seven cysteines, as found in some of the EGF-like mod- 
ules of fibrillin that have been termed "hybrid" do- 
mains (Corson et al, 1993; Pereira et al, 1993). Other 
members of the EGF-like family that possess eight- 
cysteine repeats include laminin (Pikkarainen et al, 
1988), fibrillin (Corson etal, 1993; Pereira et al, 1993), 
and the latent TGF-J3 binding proteins (LTBP) (Kan- 


zaki etal. 1990; Moren et al, 1994; Tsuji etal, 1990; 
Yin et al, 1995). The positions of the "extra" two cys- 
teine residues are unique to TIED and the |3 integrins 
and are not conserved in other eight-cysteine EGF-like 
domains present in fibrillin, laminin, and LTBPs (Fig. 
4). 

Second, the repeats show alternating similarity, 
such that odd-numbered repeats are most similar to 
one another, and vice versa, the even-numbered re- 
peats are more similar to one another than they are to 
the odd-numbered repeats. To our knowledge this par- 
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FIG. 4. Comparison of the cysteine "footprint" of the TIED and /3 integrin EGF-like repeats with those in other proteins harboring eight 
cysteine EGF-like domains. The eight cysteines contained in representative EGF-like repeats found in the laminin Bl chain (LamBl), latent 
TGF-/3 binding protein (LTBP-2), and fibrillin (Fibn) have been aligned with cysteines in the sixth and ninth repeats of TIED and the second 
and third repeats of the integrin j32 subunit (Int2.2 and Int2.3, respectively). The eight cysteine-repeat motifs have been compared with the 
six-cysteine EGF repeat motif. Gaps introduced to optimize the alignments are denoted by tildes ("). Cysteine residues are boxed in black. 


ticular feature is shared only by the (S integrins and the 
TIED molecule. A schematic comparison of the TIED 
and j3 integrin structures is shown in Fig. 5. The odd- 
numbered TIED repeats are most similar to the even- 
numbered j3 integrin repeats. In particular the se- 
quence CSGRG is highly conserved, as is the CECD 
sequence (except for the fourth integrin repeat) (Figs. 
3 A and 3B). The number of amino acids intervening 
between cysteines at positions 6 and 7 in these repeats 
varies markedly for both molecules. Vice versa, the 
even-numbered TIED repeats are most similar to the 
odd-numbered j3 integrin repeats, although this is not 
quite as obvious since the odd-numbered /3 integrin 
repeats appear to have diverged significantly during 
evolution. Importantly, the similarity between the 
integrin and the TIED repeats does not relate just to 
conserved cysteine and glycine residues, but in some 
regions extends across the entire sequences. Compari- 
son of the second repeat of the integrin j37 subunit with 
the first and seventh TIED repeats reveals 57 and 51% 
identity over 35 amino acid residues, respectively, 
which increases to 66% similarity when conservative 
substitutions are taken into account (Fig. 3C). Like- 
wise, the fourth repeat of the integrin )33 subunit 
shares 68% amino acid identity over 34 amino acid 
residues with the seventh TIED repeat, and the third 
repeat of the integrin j86 subunit shares 52% amino 
acid identity with the eighth TIED repeat over 31 
amino acid residues. Interestingly, EGF domains in 
several proteins contain the small tripeptide RGD, 
which is a major integrin binding site. The TIED se- 
quence does not include an RGD motif or other common 
integrin binding motifs. 


A class of EGF repeats found in functionally diverse 
proteins contain Ca 2+ binding domains that have the 
consensus sequence Asp/Asn-x-Asp/Asn-Glu/Gln-x„- 
Asp/Asn*-x n -Phe/Tyr (where n is variable, and the as- 
terisk indicates possible 0-hydroxylation). Solution 
structures suggest that a conserved aromatic residue 
in a Gly-Aromatic-x-Gly motif between Cys 5 and 6 
(Downing et al, 1996; Rao et al, 1995) and Ca 2 " ions 
(Knott et al, 1996) are both key elements involved in 
interdomain interactions that stabilize the three-di- 
mensional structure of EGF modules. Some of the odd- 
numbered TIED repeats and the second j3 integrin 
repeat have the sequence Glu/Asp-x-Asp-Asp/Glu/Gln 
(where x is the eighth cysteine residue), resembling 
part of the core Ca 2+ binding sequence. 

Alternative Splicing of TIED J -Untranslated Regions 

The 3'-ends of cDNA clones HUVEC5.1.1 and HL- 
HFV34 diverge from clones HSRAZ62, S0003.9, and 
HOHCH55 at nucleotide positions 1476 and 1502, re- 
spectively (Fig. 6A). To determine whether the 3 '-un- 
translated region might undergo alternative splicing, 
PCR primers were designed to the alternative 3 '-un- 
translated regions and used to amplify TIED tran- 
scripts from fetal thymus cDNA. Primer pairs 62F and 
57 (5 ' -TTTAACCTGTCCTAGTGGTG-3 ' ; nucleotides 
1565-1584) and 62F and 59 (5' TGTCTGCAGCATAC- 
CTTCC-3'; nucleotides 1796-1814) that should am- 
plify sequences contained in the S0003.9 and HO- 
HCH55 cDNA clones both generated correct-sized PCR 
products, whereas the primer pair 62F/58 (5'-TAAT- 
GAATTCCAATGTCTGTGC-3') that should amplify 
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FIG. 5. Schematic comparison of the principal structural features of TIED with jS integrins. The EGF-like repeats are numbered and 
shaded according to their alternating homology. Predominantly hydrophobic uncharged regions are denoted as solid blocks. SP, signal 
peptide; TM, transmembrane domain; and Cyto, cytoplasmic domain. 
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Since the HSRAZ62 and HOHCH55 clones were 
from osteoclastoma and osteoblast cDNA libraries, we 
examined expression of the alternative TIED tran- 
scripts in human osteogenic sarcoma U-20S cells. RT- 
PCR with the primer pair 27 (5'-CTGTGGAAACTGC- 
TACTGC-3') and 29 (5'-CGTGCAGGACTGTGGGT- 
GC-3') and primer pair 62F/57 expected to amplify 
regions encompassing nucleotides 603 to 951 and 1216 
to 1584 generated products of the expected sizes of 349 
and 369 bp, respectively, which hybridized with a HO- 




FIG. 6. TIED cDNA clones possess alternative 3'-untranslated 
regions: authenticating the 3 '-ends by RT-PCR. To determine which 
of the different 3'-untranslated sequences in the various cDNA 
clones were authentic, RT-PCR analysis was performed using anti- 
sense primers to the alternative 3' -ends. (A) The locations of PCR 
primers are shown in the upper schematic diagram where the open 
reading frame (ORF) is boxed, and the 3'-untranslated region is 
denoted by solid or dashed lines. (B) PCR products obtained with 
primers 62F/62R (lanes 2 and 3), 62F/57 (lanes 4 and 5), 62F/58 
(lanes 6, 7, and 8), and 62F/59 (lanes 9 and 10) were stained with 
ethidium bromide. Templates were plasmids containing the cDNA 
inserts HOHCH55 (lane 2), S0003.9 (lanes 4 and 9), HLHFV34 (lane 
6), and HUVEC5.1.1 (lane 8), and cDNA from a fetal thymus library 
(ATCC) (lanes 3, 5, 7, and 10). A ladder of DNA markers is shown in 
lane 1, with the sizes indicated in the left margin. (C) RT-PCR 
analysis of TIED transcripts in total RNA from human U-20S os- 
teogenic sarcoma cells (lanes 2 to 4) and PCR from a HOHCH55 
plasmid template (lanes 5 to 7). PCR primers were 27/29 (lanes 2 and 
5), 62F/58 (lane 3 and 6), and 62F/57 (lanes 4 and 7). Lane 1, DNA 
markers of 396, 344, and 298 bp. (Top) Ethidium bromide staining of 
PCR products; (bottom) the products have been hybridized to the 
32 P-labeled insert of clone HOHCH55. 


HUVEC5.1.1 and HLHFV34 sequences failed to gener- 
ate a PCR product, despite producing the expected 333- 
and 379-bp products from the respective plasmid tem- 
plates (Fig. 6B). 
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FIG. 7. Expression of TIED mRNA transcripts in various human 
tissues and in U-20S osteogenic sarcoma cells. (A) A human RNA 
Master Blot (Clontech) was hybridized with a '^P-labeled 900-bp 
EcoRl/EcdRl fragment from cDNA clone HLHFV34. The entire blot 
is shown illustrating detectable expression only in aorta (C2). 
whereas signals from dots containing poly (A) + RNA from 49 other 
tissues were not above background. The blot was rescreened to 
distinguish background spots from positive signals. Only the signal 
from aorta poly (A) + RNA was reproduced (not shown). (B) Northern 
blot of 15 jig of total RNA from U-20S cells hybridized with the 
32 P-labeled insert of clone HOHCH55 (right lane). The left lane 
illustrates an ethidium bromide stained agarose gel containing the 
total RNA isolated from U-20S cells. Positions of 28S and 18S rRNAs 
are indicated in the left margin. 
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FIG. 8. The human ITGBL1 gene maps to chromosome 13, band 13q33. (A) PCR analysis of a monochromosomal panel of human-rodent 
cell hybrid DNAs with the GM-F/GM-R primer pair generated a prominent 222-bp PCR product from chromosome 13, human DNA, and the 
HOHCH55 plasmid control. An autoradiograph of the PCR products hybridized with the 32 P-labeled insert of clone HOHCH55 is shown. 
Lanes correspond to DNA markers of 300 and 200 bp (L); plasmid HOHCH55 control (P); hamster (C). mouse (M), human (H) genomic DNA; 
no DNA control (N); and chromosome-specific somatic cell hybrid DNAs (chromosomes 1 to 22, X, and Y). (B) PCR analysis of a second panel 
of human-rodent cell hybrid DNAs. PCR amplification with the GM-F/GM-R primer set was from DNAs of cell hybrids that contained 
chromosome 13 (MOG34A4, lane 2; DUR4.3, lane 3; SIR74ii, lane 4; LSR34S49, lane 5) and from hybrids that contained all other human 
chromosomes except for chromosomes 13. 9, and 19 (TWIN19-D12, lane 6; CTP34B4, lane 7; DTI. 2.4, lane 8). Control lanes include the 
following: lane 1, no DNA; lane 9, chromosome 9-specific hybrid DNA; lane 10, chromosome 19-specific hybrid DNA; and lane 11, chromosome 
13-specific hybrid DNA. (C) Localization of the ITGBL1 gene by FISH, Gray-scale inverted image of a complete metaphase cell showing 
fluorescent signals on chromosome 13, band q33, after hybridization of the biotinylated HOHCH55 cDNA probe (left). Idiogram of 
chromosome 13 with the q33 band bracketed and aligned with signals from enlarged copies of chromosome 13 selected from three different 
metaphase cells (right). 


HCH55 cDNA probe (Fig. 6C). A 333-bp PCR product 
could also be amplified from U-20S cDNA using the 
62F/58 primer pair. Thus the 3'-ends of all the cDNA 
clones are authentic and result from alternative splic- 
ing, where those of HSRAZ62, S0003.9, and HOHCH55 
are expressed in both thymus and osteogenic sarcoma 
cells, and those of the HUVEC5.1.1 and HLHFV34 are 
expressed only in the latter. 

TIED mRNA Is Widely Expressed but Not Abundant 

TIED cDNA clones were detected in osteoclastoma, 
osteoblast, umbilical vein, and fetal lung cDNA librar- 
ies, suggesting that TIED might be widely expressed; 
however, no clones were obtained from fetal heart and 
adrenal gland tumor cell derived libraries. Screening of 
a human RNA master blot (Clontech) containing RNAs 
from 50 different tissues revealed readily detectable 
expression of TIED mRNA transcripts only in aorta 
(Fig. 7A), suggesting that the TIED message is not 


particularly abundant in the tissues examined apart 
from aorta. TIED transcripts were not detected in ei- 
ther adult or fetal heart, suggesting that expression 
was specific for aorta. Northern blot analysis of total 
RNA prepared from U-20S osteogenic sarcoma cells 
revealed a single transcript of approximately 2.8 kb 
(Fig. 7B). 

The Human ITGBL1 Gene Maps to Chromosome 13, 
Band 13q33 

PCR from genomic DNA of a panel of human-rodent 
hybrid cell lines was used to map the human ITGBL1 
gene to a particular chromosome. The expected 222-bp 
PCR product was specifically amplified from human 
genomic DNA and from the 289 hybrid, which contains 
human chromosome 13 and fragments of chromosome 
8, 11, and 12 (Fig. 8A). ITGBL1 sequences were not 
amplified from hybrids C4A, JIC14, and laA9602 + , 
which contain human chromosomes 8, 11, and 12. As- 
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signment of the ITGBL1 gene to chromosome 13 was 
confirmed by PGR analysis of a second series of chro- 
mosomal hybrids. An ITGBL1 PCR product was am- 
plified from the DNA of four hybrids that contained 
chromosome 13 (MOG34A4, DUR4.3, SIR74ii, and 
LSR34S49), but not from hybrid DNAs that contained 
all other human chromosomes except for chromosomes 
13, 9, and 19 (TWIN19-D12, CTP34B4, and DTI. 2.4) 
(Fig. 8B). 

The precise localization of the ITGBL1 gene was 
determined by FISH analysis using the HOHCH55 
cDNA insert as a probe. Of 40 metaphase cells exam- 
ined, 40 showed fluorescent signals on one or both 
chromosomes 13, specifically across band q33 (Fig. 8C). 
No additional site-specific signals were detected on any 
other chromosome. Other genes that have been 
mapped to chromosome band 13q33 include the pro 
alpha 1 and 2 (IV) collagen genes (Boyd et aL, 1988), 
the DNA ligase IV gene (Wei et aL, 1995), and the gene 
for xeroderma pigmentosum complementation group G 
(XPG) (Samec et aL, 1994). In terms of disease associ- 
ation, band 13q33 is a site for integration by human 
papilloma virus-33 (Gilles et aL, 1996); it is amplified 
in oral squamous cell carcinomas (Matsumura, 1995) 
and is commonly deleted in ovarian cancer (Yang-Feng 
etaL, 1992). 

In summary, we predict that TIED is a secreted 
protein linked in evolution to the stalk-like structure of 
integrin j3 subunits. Whether an ancestral TIED-like 
molecule was integrated into j3 integrins via gene con- 
version and attributes integrins with novel functions is 
not known. Given that EGF-like domains participate 
in protein-protein and protein- cell interactions, fu- 
ture studies will need to appraise whether TIED pro- 
tein has proadhesive, anti-adhesive, and/or growth fac- 
tor activities. 
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Abstract 

A cDNA clone, AtELPl {Arabidopsis thaliana EGF receptor-like protein) was isolated from an Arabidopsis cDNA library 
with an oligonucleotide probe corresponding to a highly conserved region of animal p-integrins. The cloning of this cDNA 
was previously reported and it has been proposed that AtELP might be a receptor involved in intracellular trafficking. In the 
present work, using two specific independent sets of anti-peptide antibodies, we show that AtELPl is mainly located in the 
plasma membrane, supporting another function for this protein. Structural studies, using methods for secondary structure 
prediction, indicated the presence of cysteine-rich domains specific to P-integrins. Database searches revealed that AtELPl is 
a member of a multigenic family composed of at least six members in A. thaliana. Northern blot analysis of AtELPl, 2b and 3 
was performed on mRNA extracted from cells cultured in normal and stressed conditions, and from several organs and 
plants submitted to biotic or abiotic stresses. All the genes are expressed at different levels in the same conditions, but 
preferentially in roots, fruits and leaves in response to water deficit. © 1999 Elsevier Science B.V. All rights reserved. 

Keywords: Arabidopsis thaliana; P-Integrin; Cysteine-rich domain; EGF domain; Plasma membrane receptor 


1. Introduction 

Plant cell morphogenesis is the result of numerous 
mechanisms involved in the control of cell division 
and expansion. The cell wall, the plasma membrane, 
and the cytoskeleton are considered as the main ac- 
tors in the establishment of polarity and morphogen- 
esis. It is now clear that the membrane and mem- 
brane proteins are kept in a dynamic state to 
maintain cell structure and compartmentalization 


* Corresponding author. Fax: +33-562-193502; 
E-mail: galaud@cict.fr 


[1]. Linkages between the plant plasma membrane 
and the cell wall can be observed after plasmolysis; 
however, the molecules engaged in this interaction 
are unknown. In animal cells, integrins are plasma 
membrane receptors involved in cellular adhesion. 
Some of them recognize extracellular proteins via 
the RGD sequence, a conserved motif in adhesion 
proteins from the extracellular matrix. 

The occurrence of integrins in plants has been sug- 
gested, but their identification remains obscure. Two 
lines of evidence support the occurrence of integrin- 
like receptors in plants. On one hand, immunological 
cross reactivity between antibodies raised against an- 
imal integrins and plant proteins has been observed. 
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Immunological approaches identify several plant 
proteins sharing common epitopes with animal integ- 
rins [2-7]. Using an animal integrin polyclonal anti- 
body for screening an Arabidopsis cDNA library, a 
membrane-associated protein involved in trafficking 
was isolated [8], These results show that there are 
plant proteins sharing some motifs with animal in- 
tegrins, but the proteins have no homology. 

On the other hand, RGD peptides interfere with 
several plant physiological processes. Indeed, the ad- 
dition of RGD peptides disrupts protoplast adhesion 
in tobacco derived from NaCl-adapted cells, [9]. 
These peptides also inhibit gravity perception in 
Char a [10], and enhance soybean cellular division 
[2]. Furthermore, in the brown alga Fucus, polarity 
determination is affected by the addition of RGD 
peptides [11]; and in Uromyces, a plant pathogenic 
fungus, appressorium formation is inhibit by the 
same compounds [5]. In agreement with these data, 
Arabidopsis plasma membrane exhibits specific high 
affinity binding sites for RGD-containing peptides 
and proteins. RGD binding is strongly inhibited by 
trypsin treatment, supporting the protein nature of 
the receptor [12]. 

In this paper, we use an oligonucleotide screening 
strategy to clone integrin-like molecules in Arabidop- 
sis. The oligonucleotide probe is defined according to 
a cytoplasmic conserved region of integrin P-subunit, 
involved in interactions with the cytoskeleton [13]. If 
homology between animal and plant integrins exists, 
a functional domain involved in the interaction with 
cytoskeleton proteins will be present. Little homol- 
ogy is expected at the extracellular level since the 
extracellular matrices of animals and plants are com- 
pletely different. The receptors binding to these ma- 
trices should reflect these differences [14]. 

2. Materials and methods 

2.L Plant material 

Arabidopsis thaliana, ecotype Columbia, was cul- 
tured in a grown chamber under fluorescent tubes 
36 W (12 W/m 2 ) with 16-h light-8-h dark photoper- 
iod. Plants were grown in pots filled with TKS2 peat 
Floratorf supplemented with 1 %o (w/w) nitrate. Ara- 
bidopsis cells were grown on Gamborg liquid me- 


dium [15]. Fifteen ml cell suspensions were routinely 
transferred to 300 ml fresh medium in 1000-ml Er- 
lenmeyer flasks every 2 weeks, and shaken (150 rpm) 
in continuous light (60 W/m 2 ) at 26°C. Cells were 
transferred to fresh culture medium containing man- 
nitol or not (250 raM), and maintained in the dark 
for different periods (1-15 days) prior to harvesting. 

2.2. cDNA library screening, sequencing and 
computer sequences analysis 

The A. thaliana cDNA library was constructed in 
pAD-GAL4 vector (Stratagene) and was kindly pro- 
vided by B. Lescure (INRA-CNRS, Auzeville). A 
17-mer degenerate oligonucleotide (AARTTYGAR- 
AARGARAA) corresponding to the peptide se- 
quence KFEKEK was synthesized. This peptide 
matched a cytoplasmic conserved region from human 
integrin (p-subunit) [16]. The oligonucleotide was la- 
beled with [y- 32 P]ATP by terminal transferase and 
used to screen the cDNA library according to Stra- 
tagene protocol. Positive clones were selected, excised 
from recombinant phage, and introduced into 
Escherichia coli strain SOLR. The isolated cDNAs 
were sequenced according to Sanger et al. [17]. 

The DNA and its deduced protein sequences were 
examined for homology in the non-redundant nu- 
cleotide and protein sequence databases using 
BLAST [18], PRODOM [19], and BLOCKS searches 
[20]. The amino acid sequence alignments were car- 
ried out on a Macintosh LC630 computer. The hy- 
drophobicity, surface probability and flexibility pro- 
files were calculated as described [21-23] with a 
window size of seven residues, using Mac Vector (Ko- 
dak). Hydrophobic cluster analysis (HCA) [24,25] 
was performed to delineate and compare the hydro- 
phobic clusters along the amino acid sequences. They 
were generated on a Macintosh LC using the pro- 
gram HCA-Plot2 (Doriane, Paris, France). 

2.3. RNA isolation and DNA-RNA hybridization 
analysis 

Total RNA was extracted from Arabidopsis cell 
suspensions at various times during the culture and 
from different organs using the guanidinium thiocya- 
nate method [26]. Total RNA (15 |Lig) was separated 
on formaldehyde agarose gel and blotted to Nytran 
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(Schleicher and Schuell) according to the manufac- 
turer's specifications. The 3'-non-coding region of 
AtELPl, 2, and 3 were amplified by PCR using de- 
generate primers, deduced from the highly conserved 
region of the three clones (5 ' - ATCATGKCAC AG- 
TAYATGCCA-3') and a TTTTTTTTTTTTTTTW. 
Each PCR fragment was subcloned in pGEM-T 
vector (Promega), sequenced and used as specific 
probe. 

2.4. Antibody production and purification 

Specific antibodies raised against selected peptides 
derived from AtELPl were prepared. Immunogenic 
peptides were defined by HCA and prediction of 
antigen determinants [27]. Two exposed hydrophilic 
regions, located between amino acids 352 and 365 
(AEQESQIGKSRGDC, peptide 63), and amino 
acids 375-384 (NNRQYRGKLEC, peptide 64), 
were defined. BLAST analysis was carried-out to ver- 
ify the presence of identical sequences in the ArabU 
dopsis database. No other known protein, but the 
AtELP family, showed sequence 63 or 64 indicating 
that the chosen peptides could be specific for AtELP 
proteins. Both peptides were synthesized automati- 
cally by stepwise F-moc-r-butyl solid phase synthesis 
[28] in a Synergy Applied Biosystems peptide syn- 
thesizer. Crude synthetic peptides were purified by 
reverse-phase HPLC. Purified peptides were char- 
acterized by mass spectrometry on a Lasermat 
spectrometer (Finnigan), and coupled to the carrier 
protein. Peptides were coupled either to tyroglobulin 
or to bovine serum albumin using A^-succimidyl-6- 
maleidocaproate as coupling reagent. 

Before immunization, a sample of preimmune se- 
rum was taken and tested against peptides 63 and 64. 
In the absence of response, the immunization was 
performed. One volume of complete (immunization) 
Freund's adjuvant was added to the tyroglobulin- 
coupled peptide (250 |Lig per injection) and injected 
into rabbits. Two rabbits were immunized against 
each coupled peptide every 2 weeks during 3 months. 
Two antisera were obtained: serum 630 for peptide 
63, and serum 640 for peptide 64. Antibodies were 
immunopurified before use. Ten micrograms of BSA- 
coupled peptide was separated by SDS-PAGE and 
transferred to nitrocellulose. The membrane was 
stained with Ponceau red. The stained region was 


cut, unstained, and blocked with TBS, 0.1% Tween, 
and 10% non-fat milk for 1 h at room temperature. 
The membrane was washed three times (15 min each) 
with TBS 0.1% Tween 1% BSA, and incubate over- 
night at 4°C with (1/25 dilution) serum. The anti- 
bodies were eluted with 500 (0.1 glycine EGTA buffer 
(glycine 0.2 M, EGTA 1 mM, pH 2.8) and neutral- 
ized with 70 ^1 Tris 1 M pH 8. 

2.5. Fractionation of A. thaliana membranes 

Microsomes from Arabidopsis cells were prepared 
according to Bardy et al. [29] with a grinding me- 
dium containing 0.17 M sucrose, 50 mM KC1, 1 mM 
DTT, and 10 mM HEPES, pH 7.5. Microsomes were 
separated by free-flow electrophoresis with an Elphor 
Vap-22 electrophoresis unit (Weber, Kirchheim- 
Heimstetten, Germany). The electrophoresis medium 
contained 0.25 M sucrose, 10 mM KC1, 1 mM 
MgCh, 10 mM Tris and 10 mM boric acid (pH 
8.3). The electrode buffer consisted of 100 mM 
Tris, 100 mM boric acid (pH 8.3). Microsomes 
were resuspended in electrophoresis medium and 
centrifuged for 30 min at 45 000Xg. Electrophoresis 
was performed at a 100 mA constant current (about 
900 V), sample injection 2 ml h" 1 , and buffer flow 
3.5 ml fraction" 1 h -1 at 4°C. The distribution of 
membranes in each separation was monitored by ab- 
sorbance at 280 nm. Membranes were collected from 
pooled fractions by centrifugation (30 min at 
45000X&). Activity of different marker enzymes 
was determined as previously described [29]. Protein 
content was determined as reported [30] with bovine 
serum albumin as standard. 

2.6. Gel electrophoresis and immunodetection 

Gel electrophoresis was carried-out on 11% acryl- 
amide gels. Samples (50 |ig purified protein) were 
solubilized in 0.125 M Tris pH 6.8, 4% SDS and 
20% glycerol prior to electrophoresis. Proteins were 
transferred to nitrocellulose, and incubated overnight 
with 630 or 640 (1/100 dilution) purified primary 
antibodies, washed, and revealed with ImmunoPure 
ABC phosphatase staining kit (Pierce). 

Antibody competition was realized by incubation 
of 1 mg non-coupled peptide with its corresponding 
antibody for 2 h at 37°C. The exhausted antibody 
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At EL PI 
AtELP2t 
At£LP4 
AtELP3 
AtELPS 



Fig, 1. Comparison of the predicted protein sequences of AtELPl, 2b, 3, 4, and 5. Residues identical in all the proteins are high- 
lighted and the conserved cysteine residues are in gray. EGF signature (epidermal growth factor) is shown by * and ICR (integrin cys- 
teine-rich motif) by a boxed circle. Putative peptide signal and transmembrane domains are boxed. Spaces, denoted by dashes have 
been introduced to optimize the alignment. 


was then incubated with the nitrocellulose mem- 
branes. 


3. Results 

3.L Molecular cloning, homology searches, and 
protein sequence analysis 

An A. thaliana cDNA library was screened with an 
oligonucleotide probe corresponding to a conserved 
cytoplasmic region of integrin P-subunits. Seven 
clones were isolated and partially sequenced. One 
clone (2712), presenting a potential transmembrane 
domain, was completely sequenced. The cDNA in- 
sert was found to be 2314 bp in length. It encodes a 


complete 623 amino acid protein, which has a pre- 
dicted molecular mass of 70 kDa and a potential 
membrane-spanning domain. Database search re- 
vealed that this clone was independently identified 
by three other groups at the same time [31]. Clone 
2712 will be called AtELPl {A, thaliana EGF-like 
protein) in this paper. 

Southern blot experiments using AtELPl cDNA 
as a probe (data not shown) revealed that AtELPl 
belongs to a multigenic family. Homology searches 
in A. thaliana EST database showed two nucleic acid 
sequences having 81 and 63% homology with 
AtELPl. These sequences were called, respectively, 
AtELP2 (accession number U79960) and AtELP3 
(EST accession number 110G6T7). The complete ge- 
nomic sequence of AtELP3 (accession number 
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Fig. 2. Schematic representation of AtELP2a, 2b, 3, 4, and 5 genomic clones. AtELP2a, 2b (BAC F26C24), 4 and 5 (BAC T09D09, 
and BAC F19I3) are located on chromosome 2. AtELP3 (BAC F18F4) is located on chromosome 3. Exons are represented by gray 
boxes and introns by white boxes. ATG represent the translation start codon. 


AL021637, pid g 2827665) and two other clones pre- 
senting homologies with AtELP 1 were deduced from 
the analysis of the A. thaliana genomic database. 
These clones were named AtELP4 (accession number 
ATAC002338, pid g 2347209) and AtELP 5 (acces- 
sion number ATAC004238, pid g 3033390) and their 
nucleic acid sequences showed 84 and 61% homology 
with AtELP 1. 

Primary and secondary sequence analyses were 
performed on AtELP 1. The polypeptide chain is 
rather hydrophilic (64% polar residues) and contains 
a high proportion of cysteine (34 residues). Its calcu- 
lated isoelectric point is 5.87. The polypeptide shares 
moderate percentages of identity (14.8 and 17.8%) 
and homology (34.6 and 38.8%) with human pi 
and P5 integrins. 

The deduced protein sequences of the five clones 
are shown in Fig. 1. AtELP proteins ranged from 
618 to 630 amino acids. They present the same 
common structural features: a potential signal 
peptide at the N-terminus, a large N-terminal 
region, a potential transmembrane domain, and 
a short C- terminal region. Alignment of AtELP 1, 
2, and 4 showed about 80% homology. The N-ter- 
minus contains many conserved regions. The number 
and position of cysteines is well conserved (34 Cys 
out of 560 amino acids). Three Cys-rich motifs 
were found in all the putative proteins. Two of 
them (EGF1 and EGF2, in Fig. 1) have the typ- 
ical arrangement Cx(3_ 7 )Cx(2_6)Cx(7_ l0 )CxCx(7_i2)C 
common to epidermal growth factors (EGF) 


with 6 cysteines in conserved positions. The 
third Cys-rich motif has a different organization 
with eight cysteines in the following sequence: 
Cx( 6 _ 8 )Cx5CxCxxCxCx( 8 _ 13 )Cx(i_2)C. This cysteine 
alignment is characteristic of P-integrin subunits 
(ICR, integrin Cys-rich motif in Fig. 1). The C-ter- 
minal domain of AtELPs (34-40 amino acids) con- 
tains a highly conserved sequence of 27 amino acids, 
but then diverges. It contains a YMPL site (amino 
acid 606-609). The Yxx<j) motif (x represents any 
amino acid and § a hydrophobic residue) has been 
demonstrated to mediate internalization from the cell 
surface as well as targeting to intracellular compart- 
ments in mammals [32]. The more divergent sequen- 
ces correspond to the potential signal peptide, the 
putative transmembrane domain, and the C-terminus 
end. 

3.2. Structure of genomic clones 

The structure of five AtELP genes is presented in 
Fig. 2. These genomic sequences were obtained by 
the systematic sequencing programs [33]. Two se- 
quences, AtELP2a (accession number AT AC 
004705, pid 3252813) and AtELPlb (accession num- 
ber ATAC 004705, pid 3252815, which corresponds 
to the cDNA previously described as AtELP2), en- 
code proteins showing 96.5% identity (22 different 
amino acids out of 628). The length of the ORF is 
the same for both genes, but they have different in- 
tron lengths. AtELP2a and AtELP2b are located on 
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chromosome 2 in reversed position, and are sepa- 
rated by a single gene. AtELP2a and 2b could be 
the result of gene duplication. AtELP4 (accession 
number ATAC 002338, pid 2347209) and AtELPS 
(accession number ATAC 004238, pid 3033390) are 
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Fig. 3. (A) Northern blot analysis of AtELPl, 2 b, and 3 gene 
expression during cell culture in control and osmotic stress con- 
ditions. Total RNA was extracted from cells cultured with 250 
mM mannitol (osmotic stress) or not (control cells). Total 
RNA (15 (i.g) was separated on formaldehyde gel blotted and 
probed with 32 P-speciflc 3'-UTR from each clone. (B) Northern 
blot analysis of AtELPl 2b, and 3 gene expression in various 
organs of Arabidopsis. Total RNA was extracted from: leaves 
(L), stem (S), flowers (F), roots (R), siliques (S) and rosettes at 
different developmental stages (7w, 7 weeks; 3w, 3 weeks; 2w, 
2 weeks; 5w, 5 weeks) and after several stresses (W, wounding; 
MS, mechanical stress; WS, water stress). 
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Fig. 4. Western blot analysis of AtELPl on purified membrane 
fractions by free-flow electrophoresis. Membranes were pre- 
pared from cells cultured in mannitol containing medium. Frac- 
tions number 1 and 2 correspond to plasma membrane, frac- 
tions 3 and 4 to endomembrane (Golgi, ER, mitochondria...) 
and fraction 5 to tonoplast. Lane 6 corresponds to plasmalem- 
ma (lane 2). Fifty ug protein were separated on by 11% SDS 
PAGE gel, blotted onto nitrocellulose and incubated with anti- 
body 630 (A) and antibody 640 (B) in the presence (+) or in 
absence (— ) of the corresponding peptide. Molecular mass 
standards are given on the left of the figure. 

also located on chromosome 2. The genomic se- 
quence corresponding to AtELP3 (accession number 
AL021637, pid 2827665) is the only one located on 
chromosome 4. At present, the AtELPl genomic 
clone has not been found. The multigenic family 
AtELP is composed of at least six genes having 
11-13 exons and 10-12 introns. Among these six 
genes, only three have so far been found expressed 
{AtELPl, AtELP2b, and AtELP 3). 

Sequence analysis of each promoter was carried 
out using the Transfac program [34]. No clearly iden- 


V. Laval et al I Biochimica et Biophysica Acta 1435 (1999) 61-70 


67 


N-ter C-ter 

100 200 300 400 500 600 


tllllllllllllllltlliplllllllllllllllllllllllllllllllMllltl 


100 200 300 400 500 600 


llllllll 

iiiiIiii 

1 1 1 1 1 

1 J H II 1 H 1 ■ 1 1 II | 

In 


111 1 III 1 


100 200 300 400 600 600 


IT,"',! Y 'J f V. fig ' AtELP; 


100 200 300 400 600 600 
■ ■ .. Imi. I. .»■!., t.l Iti.it ■■■J^ili^niibi^ilit 

■ ■ ■ ■ l...^nTT^paHMri AtELP< 

100 200 300 400 500 600 
^i ii|<ii| )f M ^iniili ii ill niliiii iiii QimlimlllllhllJ^'i 

11^' 1 1 1 II [ill ^W^Wl AtELP i 


100 200300400500600700800 


Br.. 1. 1 tmii niliiii iiMilmi i '■' llltl |'JiyljluJj^Jj^iliJ l ' l> UlJ >l vl 
Ml | n| ,|H| I ii tJ^MWlp B HiijB M H»man beta 5 integrin 


100 200 300 400 500 600 700 


ji ii i ujjJmiiJjiijI lllil li. I \\\\\ IlilJjJ 


butt 

mill 

iliinl 

UlliiliJ 

illl 

UJJ 







.11 

II 


1 


Human EGF receptor 


1000 

J 


1_ Human LDL receptor 


MS 


UL 


1000 


i Ij i ; i ij J.H. ' i^l -Tlrn ve-st 


VPS10 

Fig. 5. Schematic comparison of AtELPl, 2a, 2b, 3, 4, 5 with other cysteine-rich proteins. Human (35-integrin subunit (accession num- 
ber M35011), human EGF receptor (accession number X00588), human LDL receptor (accession number LO0352) and the yeast 
VPS 10 (accession number U07621). Cysteine-rich domains are represented by gray boxes and cysteines by small bars. At the N-ter 
end, peptide signal is represented by hatched boxes; transmembrane domains are black boxed and black circles represent putative gly- 
cosylation sites. 


tifiable regions corresponding to putative cw-acting 
elements were found in AtELP promoter genes. 

3.3. AtELP 1, AtELPlb and AtELP3 gene expression 
analysis 

Analysis of AtELP 1 gene expression was done us- 
ing the 3' non-coding region, cloned after PCR am- 
plification. Total RNA was extracted from dark cul- 
tured cells under control or osmotic stress (Fig. 3A). 
A single signal corresponding to a 2.3-kb transcript 


was observed. This signal was increased during the 
culture period under osmotic stress, compared to 
control cells. In Fig. 3B, AtELPl gene expression 
analysis was performed on total RNA extracted 
from different organs or after various stress: me- 
chanical, wounding, water deficit, and after Ralstonia 
solanacearum infection (data not shown). A weak 
signal was observed in young plantlets (rosette stage) 
compared to other organs (leaf, root, and stem). 
Stress was applied on plantlets at 5 weeks rosette 
stage, and the higher signal was observed in plants 
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left without water for 24 h. AtELPlb and AtELP3 
gene expression was analyzed using specific probes, 
3 '-non-coding region were specifically PCR amplified 
before use. Expression of AtELP2b is weakly en- 
hanced compared to AtELPl (Fig. 3A,B) and the 
basal expression level of AtELPS is weaker than 
the other genes under the conditions used. 

3.4. AtELPl localization 

Two sets of antibodies (630 and 640) raised against 
peptides derived from the AtELPl primary sequences 
were used for the localization of the protein. Purified 
membranes were obtained and characterized as de- 
scribed [29]. The proteins were separated on SDS gel 
electrophoresis, transferred to nitrocellulose mem- 
branes, and revealed using antibody 630. A single 
band around 80 kDa was observed in plasma mem- 
brane and endomembrane enriched fractions (Fig. 
4A). The higher signal was observed in fraction 2 
(plasma membrane). This signal strongly decreased 
when antibody 630 was pre-incubated with its corre- 
sponding peptide (lane 6). The same results were ob- 
tained when antibody 640 was used (Fig. 4B). 


4. Discussion 

A new class of membrane proteins (AtELPs) has 
been identified. Database search revealed that 
AtELPl was simultaneously cloned by three other 
groups. Whereas the primary sequence of the protein 
does not show obvious homology with known pro- 
teins from plants or other organisms, it has been 
proposed that AtELP might be a receptor involved 
in plant intracellular protein trafficking [35,36]. 
AtELP Ts secondary structure was compared with 
that of two well-known trafficking sorting proteins : 
the yeast VSP10 and the rat mannose-6-phosphate 
receptor (M6PR). Dot-plots performed with the 
PAM250 matrix [37] showed a limited number of 
very short diagonals. This indicated that no consis- 
tent similarities occurred between the proteins (data 
not shown). In addition, the Cys-rich domains lo- 
cated close to the C-terminal of AtELPl do not oc- 
cur in other proteins. 

HCA analysis of AtELPl and human integrins pi 
and p5 reinforced the structural comparison indicat- 


ing that these three proteins exhibit a very similar 
molecular organization characterized by the Cys- 
rich domains. AtELPl and p-integrin subunits have 
13.1 and 15% cysteine, respectively, on the stretch 
preceding the transmembrane domain [38]. These 
cysteines are presumably disulfide-bonded and such 
bonding would necessarily occur in the extracellular 
domain [39]. As in the P-integrin family, AtELPs 
contain two EGF-like signatures. The third Cys- 
rich domain present in AtELPs is characteristic of 
p-integrins as indicate in the PROSITE database. 
Such Cys-rich repeats seem to stabilize the integrin 
structure at the base of the protein. 

EGF domains have been found in a large number 
of proteins and their common feature is to be present 
in the extracellular domain of membrane or secreted 
proteins, with the exception of a prostaglandin G/H 
synthase. In SWISS-PROT database, 49 proteins 
present the EGF signature, containing six cysteines. 
In the mammalian EGF and LDL receptors, EGF 
regions seem to be involved in receptor-ligand inter- 
actions at the cell surface of animal cells [40,41]. 

Fig. 5 compares AtELP 1-5 with other Cys-rich 
membrane proteins, like the LDL receptor, the 
EGF receptor, the yeast VpslO, and the human in- 
tegrin p5. From a structural point of view, AtELPs 
seem to be closer to the integrin family than to the 
endocytic or sorting receptors. AtELPs and the P- 
subunit of animal integrins may be derived from a 
common ancestor. AtELPs may be considered as in- 
tegrin orthologous proteins. Brower et al. [42] sug- 
gest that cell surface receptors are strongly conserved 
in higher animals, but their molecular evolution re- 
mains obscure. The cloning of two cDNAs encoding 
integrin p-subunits from coral and sponge clearly 
showed that the major structural features were well 
conserved. Comparative analysis of the genome of 
the nematode Caenorhabditis elegans showed that 
the main differences observed between the two ge- 
nomes corresponded to proteins involved in the es- 
tablishment of multicellularity (adhesion molecules) 
and cell death machinery (signaling proteins) [43]. 
Brower et al. [42] indicated that even if no protein 
with obvious homology to integrins has been identi- 
fied, the existence of integrin-like molecules in plants 
[4,12] and in fungi [44,45] has been reported by sev- 
eral authors. 

Integrins recognize the RGD sequence in their li- 
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gand via interaction with DxSxS, an integrin binding 
motif [46], The presence of an equivalent motif is 
observed in AtELPl, 2a, and 2b proteins at the N- 
terminal domain. The presence on the same polypep- 
tide of a motif binding RGD (DxSxS), and an ex- 
posed RGD sequence is puzzling. A RGD sequence 
in the N-terminus is also present in the extracellular 
domain of human 02, 05 and p6 integrins. Papado- 
poulos et al. [47] reported that 7182 proteins contain 
the RGD motif. Of these proteins, only 120 are 
membrane or membrane-associated proteins having 
the RGD sequence in their extracellular domain and 
some are proteins involved in cell adhesion processes. 

Ahmed et al. [36] showed that the C-terminal re- 
gion of AtELPl is located in the cytosol. The cyto- 
plasmic domain of AtELPs is well conserved and it 
contains the Yxx<[> motif at a distance of about 20 
amino acids from the membrane-spanning region. 
This sequence is a recognition motif for endocytosis 
through clathrin-coated vesicles [32], In animal integ- 
rins, the presence of the signal sequence (NPxY), 
localized about 20-25 amino acids from the trans- 
membrane domain [16], is required for internaliza- 
tion via clathrin-coated vesicles [48]. In this line, 
the vitronectin receptor av(35 plays a double role in 
fibroblasts; it binds to and directly internalizes vitro- 
nectin. The presence of AtELP on the plasma mem- 
brane indicates that the protein follows the secretory 
pathway when it is synthesized, and may be internal- 
ized via clathrin-coated vesicles into the vacuolar 
compartment. This is supported by the presence of 
the protein in the trafficking vesicles [35], as well as 
for the presence of internalization signals in the cy- 
toplasmic tail. 

At present, six genes encoding AtELP were iden- 
tified in Arabidopsis, but only three were expressed, 
referred to EST and cDNA sequencing programs. 
The main accumulation of transcripts was observed 
in roots, in cultured cells and in young plantlets 
submitted to water stress. Previous work from Ka- 
tembe et al. [6], showed that integrin-like molecules 
were accumulated in Arabidopsis roots, an important 
gravity perception site. Wayne et al. [10] reported 
that gravisensing of Chara cells was dependent on 
RGD binding and a collagenase treatment indicated 
the presence of a PxGP motif in the RGD gravire- 
ceptor sequence. This motif is present in a conserved 
position in AtELP2a, 2b, 3, and 5 in the large 


N-terminal region, but the biological significance is 
unknown. 
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holder Ns) that have no BLAST hit at 99% identity for 
finished data and 95% identity for light-shotgun data 
were considered uncovered. The percentage of each 
clone not hit by Celera sequence was calculated by 
dividing the total length of the uncovered sequence by 
the sequence length of the clone. The total number of 
nucleotides that have no coverage in the Celera assem- 
bled contigs was calculated by summing the regions of 
no hits for all the clones that covered Celera contigs by 
less than 90% (95% for finished clones). This cutoff 
value was chosen to eliminate the occasionally low 
quality of sequences in the clone sequence data. The 
cutoff value of 90% was determined by the amount of 
no-hit sequences in 16 light-shotgun clones that are 
fully contained within three Celera contigs. A higher 
cutoff value (95%) was used for the finished data than 
for the light-shotgun data, because finished clones have 
better sequence quality. The total amount of uncovered 
sequence for each light-shotgun clones was calculated 
by multiplying the no-hit percentage of the clone by the 
clone length as determined by sizing on agarose gels 
(36). For those light-shotgun clones with unreported 
insert sizes, the sequence length, excluding Ns, was used 


instead. For finished clones, the amount of uncovered 
sequence was calculated by multiplying the no-hit per- 
cent of the clone by the clone's length. We created 
7-kbp subcontig blocks and considered each block to be 
fully present in the draft sequence if it was hit by at 
least 500 bp of external sequences. We chose these 
parameters conservatively, based on the fact that at 1 x 
sequence coverage, the chance of failing to sample a 
7-kbp region covered by a light-shotgun clone is 1 in 
10 6 . For the WCS assembly, we identified 1380 blocks 
that were hit by less than 500 bp of clone sequence and 
794 blocks that were completely missed by the done 
sequence. The total number of missed blocks is 21 74, 
which represents a total 152 Mbp. 

34. M. Ashburner ef at., Genetics 153, 179 (1999). 

35. Seven conflicts were identified in this study, six of 
which appear to be owing to transposable elements. 
The remaining represents a 30-kbp insert within a 
Celera contig that does not match the corresponding 
clone. This discrepancy is still under investigation. 

36. wwwjciencemag.org/feature/data/1049666.shl 

37. S. Altschul et at.. Nucleic Acids. Res. 25, 3389 (1997). 

38. R. A. Hoskins, personal communication. 
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39. In order to align the Celera sequences unambiguously 
to the external data, all significant HSPs at the param- 
eters given in (27) were screened to identify "mutually 
unique regions" where the clone and contig sequences 
have a unique, reciprocal match relation. 

40. Most negative gaps arise because of inaccuracies in 
the distances implied by bundles — the bundle implies 
a small amount of overlap between two contigs 
because it is actually short, whereas the reality is that 
there is a small gap at that location. In a very small 
number of cases, there is an overlap, but it is because 
the distance estimate is too long by 3 standard 
deviations, or because there is a small bit of foreign 
DNA at the tip of a contig because of untrimmed 
vector or a chimeric read. None of these negative 
gaps has yet been found to imply incorrect assembly. 

41. We wish to thank H. Smith and S. Salzburg for the 
many collegial exchanges, M. Peterson and his team 
for keeping the machines humming, R. Thompson and 
his staff for providing us with an environment con- 
ducive to such an intense effort, and A. Clodek, C. 
Kraft, and A. Deslattes Mays, and their staff for 
getting the data to us. 
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A comparative analysis of the genomes of Drosophila melanogaster, 
Caenorhabditis etegans, and Saccharomyces cerevisiae — and the proteins 
they are predicted to encode — was undertaken in the context of cellular, 
developmental, and evolutionary processes. The nonredundant protein 
sets of flies and worms are similar in size and are only twice that of yeast, 
but different gene families are expanded in each genome, and the mul- 
tidomain proteins and signaling pathways of the fly and worm are far 
more complex than those of yeast. The fly has orthologs to 1 77 of the 289 
human disease genes examined and provides the foundation for rapid 
analysis of some of the basic processes involved in human disease. 


With the full genomic sequence of three ma- 
jor model organisms now available, much of 
our knowledge about the evolutionary basis 
of cellular and developmental processes will 
derive from comparisons between protein do- 
mains, intracellular networks, and cell-cell 
interactions in different phyla. In this paper, 
we begin a comparison of D. melanogaster, 
C. elegans, and S. cerevisiae. We first ask 
how many distinct protein families each ge- 
nome encodes, how the genes encoding these 
protein families are distributed in each ge- 
nome, and how many genes are shared among 
flies, worms, yeast, and mammals. Next we 
describe the composition and organization of 
protein domains within the proteomes of fly, 
worm, and yeast and examine the representa- 
tion in each genome of a subset of genes that 
have been directly implicated as causative 


agents of human disease. Then we compare 
some fundamental cellular and developmen- 
tal processes: the cell cycle, cell structure, 
cell adhesion, cell signaling, apoptosis, neu- 
ronal signaling, and the immune system. In 
each case, we present a summary of what we 
have learned from the sequence of the fly 
genome and how the components that carry 
out these processes differ in other organisms. 
We end by presenting some observations on 
what we have learned, the obvious questions 
that remain, and how knowledge of the se- 
quence of the Drosophila genome will help 
us approach new areas of inquiry. 

The "Core Proteome" 

How many distinct protein families are en- 
coded in the genomes of D. melanogaster, C. 
elegans, and S. cerevisiae (/), and how do 


these genomes compare with that of a simple 
prokaryote, Haemophilus influenzae 1 ? We 
carried out an "all-against-alP comparison of 
protein sequences encoded by each genome 
using algorithms that aim to differentiate 
paralogs — highly similar proteins that occur 
in the same genome — from proteins that are 
uniquely represented (Table 1). Counting 
each set of paralogs as a unit reveals the u core 
proteome": the number of distinct protein 
families in each organism. This operational 
definition does not include posttranslationally 
modifed forms of a protein or isoforms aris- 
ing from alternate splicing. 

In Haemophilus, there are 1 709 protein cod- 
ing sequences, 1 247 of which have no sequence 
relatives within Haemophilus (2). There are 1 78 
families that have two or more paralogs, yield- 
ing a core proteome of 1425. In yeast, there are 
6241 predicted proteins and a core proteome of 
4383 proteins. The fly and worm have 13,601 
and 18,424 (3) predicted protein-coding genes, 
and their core proteomes consist of 8065 and 
9453 proteins, respectively. It is remarkable that 
Drosophila, a complex metazoan, has a core 
proteome only twice the size of that of yeast. 
Furthermore, despite the large differences be- 
tween fly and worm in terms of development 
and morphology, they use a core proteome of 
similar size. 
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Gene Duplications 

Much of the genomes of flies and worms 
consists of duplicated genes; we next asked 
how these paralogs are arranged. The fre- 
quency of local gene duplications and the 
number of their constituent genes differ wide- 
ly between fly and worm, although in both 
genomes most paralogs are dispersed. The fly 
genome contains half the number of local 
gene duplications relative to C. elegans (4) y 
and these gene clusters are distributed ran- 
domly along the chromosome arms; in C. 
elegans there is a concentration of gene du- 
plications in the recombinogenic segments of 
the autosomal arms (7). In both organisms, 
approximately 70% of duplicated gene pairs 
are on the same strand (306 out of 41 7 for D. 
melanogaster and 581 out of 826 for C. el- 
egans). The largest cluster in the fly contains 
1 7 genes that code for proteins of unknown 
function; the next largest clusters both consist 
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of glutathione S-transferase genes, each with 
10 members. In contrast, 11 of 33 of the 
largest clusters in C. elegans consist of genes 
coding for seven transmembrane domain re- 
ceptors, most of which are thought to be 
involved in chemosensation. Other than these 
local tandem duplications, genes with similar 
functional assignment in the Gene Ontology 
(GO) classification (5) do not appear to be 
clustered in the genome. 

We next compared the large duplicated gene 
families in fly, worm, and yeast without regard 
to genomic location. All of the known and 
predicted protein sequences of these three ge- 
nomes were pooled, and each protein was com- 
pared to all others in the pool by means of the 
program BLASTP. Among the larger protein 
families that are found in worms and flies but 
not yeast are several that are associated with 
multicellular development, including ho- 
meobox proteins, cell adhesion molecules, and 
guanylate cyclases, as well as trypsinlike pep- 
tidases and esterases. Among the large families 
that are present only in flies are proteins in- 
volved in the immune response, such as lectins 
and peptidoglycan recognition proteins, trans- 
membrane proteins of unknown function, and 
proteins that are probably fly-specific: cuticle 
proteins, peritrophic membrane proteins, and 
larval serum proteins, 

Cene Similarities 

What fraction of the proteins encoded by 
these three eukaryotes is shared? Compara- 
tive analysis of the predicted proteins encod- 
ed by these genomes suggests that nearly 
30% of the fly genes have putative orthologs 
in the worm genome. We required that a 
protein show significant similarity over at 
least 80% of its length to a sequence in 
another species to be considered its ortholog 
(6). We know that this results in an underes- 
timate, because the length requirement ex- 
cludes known orthologs, such as homeodo- 
main proteins, which have little similarity 
outside the homeodomain. The number of 
such fly-worm pairs does not decrease much 
as the similarity scores become more strin- 
gent (Table 2A), which strongly suggests that 
we have indeed identified orthologs, which 
may share molecular function. Nearly 20% of 
the fly proteins have a putative ortholog in 
both worm and yeast; these shared proteins 


probably perform functions common to all 
eukaryotic cells. 

We also compared the proteins of fly, 
worm, and yeast to mammalian sequences. 
Most mammalian sequences are available as 
short expressed sequence tags (ESTs), so we 
dispensed with the requirement for similarity 
over 80% of the length of the proteins. Table 
2B presents these data. Half of the fly protein 
sequences show similarity to mammalian 
proteins at a cutoff of E < 10" 10 (where E is 
expectation value), as compared to only 36% 
of worm proteins. This difference increases 
as the criteria become more stringent: 25% 
versus 15% at E < 10~ 5 ° and 12% versus 7% 
at E < 10~ lo °, Because many of the compar- 
isons are with short sequences, it is likely that 
many of these sequence similarities reflect 
conserved domains within proteins rather 
than orthology. However, it does suggest that 
the Drosophila proteome is more similar to 
mammalian proteomes than are those of 
worm or yeast. 

Protein Domains and Families 

Proteins are often mosaic, containing two or 
more different identifiable domains, and do- 
mains can occur in different combinations in 
different proteins. Thus, only a portion of a 
protein may be conserved among organisms. 
We therefore performed a comparative anal- 
ysis of the protein domains composing the 
predicted proteomes from D. melanogaster, 
C. elegans, and S. cerevisiae using sequence 
similarity searches against the SWISS- 
PR OT/TrEMBL nonredundant protein data- 
base (7), the BLOCKS database and the 
InterPro database {9). The 200 most common 
fly protein families and domains are listed in 
Table 3, and the 10 most highly represented 
families in worm and yeast are shown in 
Table 4. InterPro analyses plus manual data 
inspection enabled us to assign 7419 fly pro- 
teins, 8356 worm proteins, and 3056 yeast 
proteins to either protein families or domain 
families. We found 1400 different protein 
families or domains in all: 1177 in the fly, 
1133 in the wonn, and 984 in yeast; 744 
families or domains were common to all three 
organisms. 

Many protein families exhibit great dis- 
parities in abundance, and only the C2H2- 
type zinc finger proteins and the eukaryotic 


Table 1. Numbers of distinct gene families versus numbers of predicted genes and their duplicated copies 
in H. influenzae, S. cerevisiae, C. elegans, and D. melanogaster. Row one shows the total number of genes 
in each species. Row two shows the total number of all genes in each genome that appear to have arisen 
by gene duplication. Row three is the total number of distinct gene families for each genome. Each 
proteome was compared to itself using the same parameters as described in (63). 



H. influenzae 

S. cerevisiae 

C. elegans 

D. melanogaster 

Total no, of predicted genes 

1709 

6241 

18424 

13601 

No. of genes duplicated 

284 

1858 

8971 

5536 

Total no. of distinct families 

1425 

4383 

9453 

8065 
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protein kinases are among the top 10 protein 
families common to all three organisms. 
There are 352 zinc finger proteins of the 
C2H2 type in the fly but only 138 in the 
worm; whether this reflects greater regulatory 
complexity in the fly is not known. The pro- 
tein kinases constitute approximately 2% of 
each proteome. Curation of the genomic data 
revealed that Drosophila has approximately 
300 protein kinases and 85 protein phospha- 
tases, around half of which had previously 
been identified. In contrast, there are approx- 
imately 500 kinases and 185 phosphatases in 
the worm; the difference is largely due to the 


worm-specific expansion of certain families 
such as the CK1, FER, and KJN-15 families. 
There are currently approximately 600 ki- 
nases and 1 30 phosphatases in humans, and it 
is expected that these figures will rise to 1 1 00 
and 300, respectively, when the sequence of 
the human genome is completed (10). Of the 
proteins uncovered in this analysis, over 70% 
exhibit sequence similarity outside the kinase 
or phosphatase domain to proteins in other 
species. In the kinase group, approximately 
75% are serine/threonine kinases, and 25% 
are tyrosine or dual-specificity kinases. Over 
90% of the newly discovered kinases are 


Table 2A. Similarity of sequences in predicted proteomes of D. metanogaster, 5. cerevisiae, and C 
elegans. To be scored as a similarity, each pairwise similarity was required to extend over more than 80% 
of the length of the query sequence at an E value less than that indicated. For example, in "Fly proteins 
in Fly-yeast," the column labeled E < 10" 10 shows the number and percentage of fly proteins that match 
yeast proteins at this E value or less and for which more than 80% of the length of the fly protein is 
aligned with the yeast protein. Each set of pairs was analyzed without consideration of the third 
proteome. The rows labeled "Fly-worm-yeast" report the composition of an independent clustering in 
which only groups containing a member from all three proteomes were counted. The numbers are slightly 
higher for the "Fly-worm-yeast" counts than for the "Fly-yeast" or "Worm-yeast" counts because of 
sequence bridging; that is, not all sequences within a group necessarily have a significant match to all 
other members of that group. See (6) for details. 



E < 10" 

-10 

£ < 

10~ 20 

£ < 10" 

-so 

E < 10- 

-100 

w 

(%) 

(n) 

(%) 

W 

{%) 

in) 

(%) 

Fly proteins in: 









Fly-yeast 

2345 

16.5 

1877 

13.2 

1036 

7.3 

433 

3.1 

Fly -worm 

4998 

35.2 

4212 

29.7 

2442 

17.2 

1106 

7.8 

Fly-worm-yeast 

3303 

23.3 

2428 

17.1 

1113 

7.8 

435 

3.1 

Worm proteins in: 









Worm -yeast 

2184 

11.8 

1768 

9.5 

933 

5.0 

374 

2.0 

Fly -worm 

4795 

25.8 

4004 

21.6 

2403 

12.9 

1092 

5.9 

Fly-worm-yeast 

3229 

17.4 

2439 

13,1 

1115 

6.0 

419 

2.3 

Yeast proteins in: 









Fly-yeast 

1856 

29.4 

1567 

24.8 

891 

14.1 

376 

6.0 

Worm-yeast 

1704 

27.0 

1425 

22.6 

802 

12.7 

335 

5.3 

Fly-worm-yeast 

1833 

29.1 

1525 

24.2 

831 

13.2 

352 

5.6 


Table 2B. A comparison of D. metanogaster, C. elegans, and S. cerevisiae protein sequences to each other 
and to mammalian sequences (64). This table reports the number and percent of fly, worm, or yeast query 
sequences with similarities less than the indicated E value cutoffs. For example, in the "Fly vs. Yeast" 
comparison, 3986 or 28.1% of fly proteins have a similarity with a yeast protein with an E value less than 
1 x 10" 10 . EST E values are not directly comparable to protein E values, because the resulting alignments 
are shorter. 
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Fly vs. 











Yeast 

8177 

57.6 

3986 

28.1 

2677 

18.9 

1266 

8.9 

504 

3.6 

Worm 

5110 

36.0 

6743 

47.5 

5180 

36.5 

2832 

19.9 

1197 

8.4 

Mammalian 

5833 

41.1 

7032 

49.5 

5837 

41.1 

3580 

25.2 

1772 

12.5 

Mammalian ESTs 

5386 

37.9 

7329 

51.6 

5352 

37.7 

1775 

12.5 

110 

0.8 

Worm vs. 











Yeast 

12541 

68.0 

3582 

19.4 

2378 

12.9 

1106 

6.0 

401 

2.2 

Fly 

8603 

46.7 

7138 

38.8 

5428 

29.5 

2880 

15.6 

1229 

6.7 

Mammalian 

10152 

55.1 

6550 

35.6 

4999 

27.1 

2782 

15.1 

1211 

6.6 

Mammalian ESTs 

10354 

56.2 

6005 

32.6 

4000 

21.7 

1170 

6.4 

68 

0.4 

Yeast vs. 











Fly 

2614 

41.9 

2564 

41.0 

1910 

30.6 

1021 

16.4 

408 

6.5 

Worm 

2762 

44.2 

2358 

37.8 

1730 

27.7 

882 

14.1 

348 

5.6 

Mammalian 

3230 

51.7 

2340 

37,5 

1802 

28.9 

992 

15.9 

429 

6.9 

Mammalian ESTs 

3106 

49.7 

2319 

37.1 

1553 

24.9 

503 

8.1 

18 

0.3 


predicted to phosphorylate serine/threonine 
residues; this group includes the first atypical 
protein kinase C iso forms identified in Dro- 
sophila. In addition, we found counterparts of 
the mammalian kinases CSK, MLK2, ATM, 
and Peutz-Jeghers syndrome kinase, and ad- 
ditional members of the Drosophila GSK.3B, 
casein kinase I, SNFl-like, and Pak/STE20- 
like kinase families. In the fly protein phos- 
phatase group, approximately 42% are pre- 
dicted to be serine/threonine phosphatases; 
48% are tyrosine or dual-specificity phospha- 
tases. Among the newly discovered phospha- 
tases, 35% are serine/threonine phosphatases, 
most of which are related to the protein phos- 
phatase 2C family, and 65% are tyrosine or 
dual-specificity phosphatases. The fly and 
worm both contain close relatives to many of 
the known mammalian lipid kinases and 
phosphatases; however, no SH2-containing 
inositol 5' phosphatase SHIP is apparent. Fi- 
nally, it has been found that the assembly of 
kinase signaling complexes in vertebrate cells 
is aided by the presence of scaffolding and 
adaptor molecules, many of which contain 
phosphoprotein binding domains; we found 
85 such proteins in the fly, including coun- 
terparts to IRS, VAV, SHC, JIP, and MP1. 

Two remarkable findings emerge from the 
peptidase data that may reflect different ap- 
proaches to growth and development in flies, 
worms, and humans. The pattern and distri- 
bution of peptidase types are similar between 
the fly and the worm: there are approximately 
450 peptidases in the fly and 260 in the 
worm. The difference is due almost entirely 
to the expansion or contraction of a single 
class of trypsin-like (SI) peptidases. C. e/- 
egans has seven of this class and yeast has 
one, but the fly has 199. Of these, 163 are 
small proteins of approximately 250 amino 
acids containing single trypsin domains; very 
few are mosaic proteins. The remainder have 
either multiple trypsin-like domains or long 
stretches of amino acids with no readily iden- 
tifiable motif, usually at the NH 2 -terminus. In 
humans, trypsin-like peptidases perform di- 
verse functions in digestion, in the comple- 
ment cascade, and in several other signaling 
pathways (/ /), and flies may have a similarly 
wide range of uses for these proteins. The 
extensively characterized members of this 
family, which include Snake, Easter, Nudel, 
and Gastrulation-defective. are all key mem- 
bers of a regulatory cascade that controls 
dorsoventral patterning in the fly (12). In 
addition, flies have only two members of the 
M10 class of peptidases, which include the 
matrix metalloproteases, collagenases, and 
gelatinases that are essential for tissue remod- 
eling and repair in vertebrates. 

The number of identifiable multidomain 
proteins is similar in the fly and the worm: 
2130 and 2261, respectively. Yeast has only 
672 (Table 5). Part of this difference is ac- 
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counted for by proteins with extracellular 
domains involved in cell-cell and cell-sub- 
strate contacts (13), such as the immunoglob- 
ulin domain-containing proteins, which are 
more abundant in flies than in worms (153 
versus 70) and are nonexistent in yeast. Two 
other common extracellular domains occur in 
similar numbers in fly and worm: EGF (110 
versus 109, respectively) and fibronectin type 
III (46 versus 43) but are rare or absent in 
yeast. Extracellular regions of proteins often 
contain a variety of repeated domains {14), 
and so these proteins may account for our 
finding that flies have a larger number of 
proteins with multiple InterPro domains than 
either worms or yeast (2107 versus 1747 and 
525, respectively) (Table 6), Some multido- 
main proteins of the fly are particularly het- 
erogeneous: Two low-density lipoprotein re- 
ceptor-related proteins have 75 InterPro do- 
mains each. Another protein of unknown 
function has 62 InterPro domains; the most 
heterogeneous worm and yeast proteins 
[SWISS-PROT/TrEMBL accession numbers 
(AC), Q04833 and P32768, respectively] 
have 61 and 18 InterPro domains, respective- 
ly. There can be extensive repetition of the 
same domain within a protein; for example, 
an immunoglobulin-like domain is repeated 
52 times within one protein of unknown func- 
tion in the fly. The large worm protein UNC- 
89 contains 48 immunoglobulin-like domains 
(SWISS-PROT/TrEMBL AC, Q17362). In 
contrast, the largest number of repeats in yeast, 
of a C2H2-type zinc finger domain, occurs 
nine times in the transcription factor TFIIIA 
(SWISS-PROT/TrEMBL AC, P39933), 

The heterotrimeric GTP-binding protein 
(G protein)-coupled receptors (GPCRs) are a 
large protein family in flies, worms, and ver- 
tebrates whose members are involved in syn- 
aptic function, hormonal physiology, and the 
regulation of morphological movements dur- 
ing gastrulation and germ band extension 
(75). There are predicted to be at least 700 
GPCRs in the human genome (16) and 
roughly 1 100 GPCRs in C. elegans (77). We 
found approximately 160 GPCR genes in the 
Drosophila genome, 57 of which appear to be 
olfactory receptors. Drosophila, C. elegans, 
and vertebrates each have diverse families of 
odorant receptors that, although recognizable 
as GPCRs, are unrelated by sequence and 
therefore apparently evolved independently. 
The number of odorant receptors in verte- 
brates ranges from around 100 in zebrafish 
and catfish to approximately 1000 in the 
mouse; C. elegans also has approximately 
1000, In the fly, as in zebrafish and mouse, 
there is a correlation between the number of 
odorant receptors and the number of discrete 
synaptic structures called glomeruli in the 
olfactory processing centers of the brain (16, 
IS). In the mouse, each glomerulus is dedi- 
cated to receiving axonal input from neurons 


expressing a particular odorant receptor (16). 
Therefore, the correlation between number of 
odorant receptors and number of glomeruli 
may reflect a conservation in the organiza- 
tional logic of odor recognition in insect and 
vertebrate brains. Although the fly odorant 
receptors are extremely diverse, there are a 
number of subfamilies whose members share 
50 to 65% sequence identity. The distribution 
of odorant receptor genes is different among 
these organisms as well. Unlike C. elegans or 
vertebrate odorant receptors, which are in 
large linked arrays, the fly odorant receptor 
genes are distributed as single genes or in 
arrays of two or three. Vertebrate receptors 
are encoded by intronless genes, but both fly 
and worm receptor genes have multiple in- 
trons. These distinctions suggest that in addi- 
tion to differences in the sequences of the 
odorant receptors of the different organisms, 
the processes generating the families of re- 
ceptors may have differed among the lineages 
that gave rise to flies, worms, and vertebrates. 

The data suggest conservation of hormone 
receptors between flies and vertebrates; nev- 
ertheless, there is a greater diversity of hor- 
mone receptors in both C elegans and verte- 
brates than in Drosophila. Insects are subject 
to complex hormonal regulation, but no ap- 
parent homologs of vertebrate neuropeptide 
and hormone precursors were identified. 
However, many receptors with sequence sim- 
ilarity to vertebrate receptors for neurokinin, 
growth hormone secretogogue, leuto tropin 
(follicle-stimulating hormone and luteinizing 
hormone), thyroid-stimulating hormone, ga- 
lanin/allatostatin, somatostatin, and vasopres- 
sin were identified. Other GPCRs include a 
seventh Drosophila rhodopsin and homologs 
of adenosine, metabotropic glutamate, 7-ami- 
nobutyric acid (GABA), octopamine, seroto- 
nin, dopamine, and muscarinic acetylcholine 
receptors. In addition, there are GPCRs that 
are unique to Drosophila, others with se- 
quence similarity to C. elegans and human 
orphan receptors, and an insect diuretic hor- 
mone receptor that is closely related to ver- 
tebrate corticotropin -releasing factor recep- 
tor. Finally, we found several atypical seven- 
transmembrane domain receptors, including 
10 Methuselah (MTH)-like proteins and four 
Frizzled (FZ)-like proteins. A mutation in 
mth increases the fly's life-span and its resis- 
tance to various stresses (19); the FZ-like 
proteins probably serve as receptors for dif- 
ferent members of the Wingless/Wnt family 
of ligands. 

Human Disease Genes 

Studies in model organisms have provided 
important insights into our understanding of 
genes and pathways that are involved in a 
variety of human diseases. In order to esti- 
mate the extent to which different types of 
human disease genes are found in flies, 


worms, and yeast, we compiled a set of 289 
genes that are mutated, altered, amplified, or 
deleted in a diverse set of human diseases and 
searched for similar genes in D. melano- 
gastei\ C. elegans, and S. cerevisiae, as de- 
scribed in the legend to Fig. 1. Of these 289 
human genes, 177 (61%) appear to have an 
ortholog in Drosophila (Fig. 1). Only pro- 
teins with similar domain structures were 
considered to be orthologs; this judgment was 
made by human inspection of the InterPro 
domain composition of the fly and human 
proteins. The importance of human inspec- 
tion, as well as consideration of published 
information, is underscored by the fact that 
some sequences with extremely high similar- 
ity scores to proteins encoded by fly genes, 
such as LCK and Myotonic Dystrophy 1, 
were judged not to be orthologous, but others 
with relatively low scores, such as p53 and 
Rbl, were considered to be orthologs. We 
attempted tliis additional level of analysis 
only for the fly proteins, as the lower overall 
level of similarity of worm and yeast proteins 
made these subjective judgments even more 
difficult. Some of the human disease genes 
that are absent in Drosophila reflect clear 
differences in physiology between the two 
organisms. For instance, none of the hemo- 
globins, which are mutated in thalassemias, 
have orthologs in Drosophila. In flies, oxy- 
gen is delivered directly to tissues via the 
tracheal system rather than by circulating 
erythrocytes. Similarly, several genes re- 
quired for normal rearrangement of the im- 
munoglobulin genes do not have Drosophila 
orthologs. 

Of the cancer genes surveyed, 68% appear 
to have Drosophila orthologs. In addition to 
previously described proteins, these searches 
identified clear protein orthologs for menin 
(MEN; multiple endocrine neoplasia type 1), 
Peutz-Jeghers disease (STK1 1), ataxia telan- 
giectasia (ATM), multiple exostosis type 2 
(EXT2), a second bCL2 family member, a 
second retinoblastoma family member, and a 
p53-like protein. Despite its relatively low 
sequence similarity to the human genes, the 
Drosophila gene encoding p53 was consid- 
ered an ortholog because it shows a con- 
served organization of functional domains, 
and its DNA binding domain includes many 
of the same amino acids that appear to be hot 
spots for mutations in hiunan cancer. Com- 
parison of the fly p53-like protein with the 
human p53, p63, and p73 proteins suggests 
that it may represent a progenitor of this 
entire family. In mammalian cells, levels of 
p53 protein are tightly regulated in vivo by its 
interaction with the Mdm2 protein, which in 
turn binds to pl9ARF (20). This mode of 
regulation, which modulates the activity of 
p53 but probably not of p63 or p73 (21), may 
not apply to the Drosophila protein, because 
we have not been able to identify orthologs of 
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Table 3. Number of proteins in D. melanogaster (F), C elegans (W), and S. 
cerevisiae (Y) containing the 200 most frequently occurring protein domains in 
D. melanogaster. Domain identifiers are from InterPro (9), a new database that 
has begun to integrate the independent databases of localized protein sequence 
patterns into a single resource. The beta release used includes PROSITE, PRINTS, 
and PFAM. InterPro considers a signature to be true if its score is above a 
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threshold specified for that signature by the individual database. Results of the 
InterPro analysis may differ from results obtained based on human curation of 
protein families, due to the limitations of large-scale automatic classifications. In 
some instances, different InterPro domains correspond to different features of 
proteins within the same family; for example, IPR001650 and IPR001410 (26 and 
42 in the table). See (62) for live linb to the InterPro database. 
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97. 

IPR0p0832 

24 1 10 

0 

G-protein coupled receptors family 2 (secretin-like) 

98. 

IPR001140 

24 j 30^ 

10 

ABC transporter transmembrane region 

99. 

IPR001214 

24 1 27 

6 

SET-domain of transcriptional regulators (TRX, E2, 
ASH1 etc) 

100. 

IPR001871 i 

24 1 18 

15 

bZIP (Basic-leucine zipper) transcription factor 
family 

101. 

IPR002049 

23jl 16 

o 

Laminin-type EGF-like (LE) domain 

102. 

IPR00211 1 ^ 

23 

I. 21 

2 

Cation channels, 6TM region (transient receptor 
potential subtype) 

103. 

IPR000048 


2 

IQ calmodulin-binding domain 

104. 

IPR001353 

,.2?J 12 

14 

Multispecific proteases of the proteasome 

105. 

IPR00181pj 

_2 2j| 215 

11 

F-box domain 

106. 

1PR002223 

22 j 34 

0 

Pancreatic trypsin inhibitor (Kunitz) family 

107. 

JPR000718 

21 jj 29 

0 

Neprilysin metalloprotease (M13) family 

108. 

IPR000964 

21 3 15| 

3 

Sterile-alpha module (SAM) domain 

109. 

1PR001311 

. 2 M „ 13 

0 

Solute binding protein/glutamate receptor domain 

110. 

IPRp01394i 

21 j 24 

, 18 

Ubiquitin carboxyl-terminal hydrolases family 2 

111. 

IPR001594! 

21 J 13^ 

6 

DHHC-type Zn-finger 

112. 

IPR001628I 

21 J 224 

0 

C4-type steroid receptor zinc finger 

113. 

IPR002017] 

?1 j 19_ 


Spectrin repeat 

114. 

IPR002113 

21 J 6 

4 

Adenine nucleotide translocator 1 

115. 

IPR002126 

21 | 15 

0 

Cadherin domain 

116. 

IPR000195 

20 1 17 

12 

RabGAP/TBC domain 

117. 

JPR000198 

20 J 19; 

10 

RhoGAP domain 

118. 

IPR000795 1 

20j 17, 

15 

GTP-binding elongation factor 

119. 

JPR001930J 

20 J 11 

4 

Membrane alanyl dlpeptidase, family M1 

120. 

IPR002422; 

20 1 14 

7 

Permeases for amino acids & related compounds, 
family II 

121. 

IPR000166] 

19 J 33 

16 

Histone-fold/TFIID-TAF/NF-Y domain 

122. 

IPR0O0690J 

19 J 8 

7 

RNA-binding protein C2H2 Zn-finger domain 

123. 

jPR001766; 


4 

Fork head domain 


|. 17 

8 

Cyclophilin-type pepUdyl-prolyl I cjs-trans isomerase 

125j IPR002293J 19 j 16 

25 

Permeases for amino acids & related compounds, 
family 1 

JA6;t!PR0W175j_J8j.„ .12 

__A 

Sodium: neurotransmitter symporter family 

127j lPR000330| 18 g 20 

17 

SNF2 & others N-terminal domain 
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Table 3 (continued). 



Acc. No. 

F 

W 

Y 

Interpro Domain Name 

128. 

IPR000742 

18 

g 

o 

EGF-like domain, subtype 2 

129. 

IPR000961 

18 

24 

10 

Protein kinase C terminal domain 

130. 

IPR001173 

18 

17 

4 

GlycosyJ transferase, family 2 

131 J 

IPR000242 

17 

76 

3 

Tyrosine speciflcjiroteinjphosphatases 

132J 

IPR000467 

17 

11 

4 

D111 domain 

133. 

IPR000636 

17 

22 

1 

Cation channels, 6TM region ^non-ligand gated) 

134. 

IPR000717 

17 

13 

8 

Domain in components of the proteasome, COP9- 
complex & elF3 (PCI) 

135.1 

IPR000953 

17 

15 

2 

Chromo domain 

J36j 

IPR001071 

17 

0 

0 

Alpha-tocopherol transport protein 

J3£j 

IPR001163 

17 

11 j 

16 

Small nuclear hbonucteoproteinJSm protein) 

138.J 

IPR001327 

17 

. 4 

4 

FAD-dependent pyridine nucleotide reductase 

139.1 

IPR001395 

17 

11 

6 

Aldo/keto reductase family 

140.1 

IPR001734 

17 

3 

1 

Sodium :solute symporter family 

141.1 

IPR001757 

17 

22 j 

17 

E1-E2 ATPases phosphorylation site 

142.! 

IPR001791] 

_ 1 7i 

. 1 6 

0 

Laminin-G domain 

143; 

IPR001873 

17 J 

— ?2J 

.0 

Amitoride-sensitive sodium channel 

144.1 

IPR001969 

17 j 

_. A„ 

42 

Eukaryotic viral [ aspar^ proteases active sjte_ 

145.! 

IPR00p087] 

16 ! 

166 

0 

Collagen triple helix repeat 


IPR000253 

_« 

_6J 

16 

Forkhead-associated (FHA) domain 

147; 

IPRpp0536j 

16 

88J 

0 

Ligand-blnding domain of nuclear hormone receptor 

148.! 

IPR001320 

16 

10 

0 

Ugand-gated ion channel 

149.J 

IPR0qi487j 

16J[ 

13 

10 

Bromodomain 

150.t 

IPR002027 


11 1 24 

Amjno add permease 

151.J 

IPR002046 

16 


SAR1 GTP-binding protein family 

J52. 
~is£ 

Mmmjm 

| 8j 1 

Generalized PAS domain 

IPR000172 

15 j 1J 0 

GMC oxidoreductases 

154. 

1PR000251 

J 5 | 12 1| 7 


155. 

IPR000569 

15 jj 5jj 5 

JH E^TKlomain (U biquitjrvtransf e rase) 


IPR000772_ 

15 j 12 j 0 

Lectin . domainol ^ricin b-chain, 3 copies, 

157. 

IPR001223 

.ly., 34 !,, 1 

Glycosyt hydrolases family 18 

158. 

IPR001609 

15 j 20 J _ 5 

Myosin head (motor domain) 

159. 

IPR001828 

15 [J 19 J 0 J Receptor family ligand binding region 

160. 

IPR002129 

15 J 7JJ 1 | Pyridoxal : dependent decarboxylase family 

-161. 

(PR002465 

15 J 1 j) 0 J Growth factor & cytokine receptor family signature 2 

162. 

IPR000159 

14 J _ 11 JJ _ 2 | Ras-associated (RalGpS/AF-6) domain 

J63-i 

1^000225; 

1 4 J 6 j| 2 J jArmadillo/plak^^^ ARM repeat_ 

J64. 4 

IPR000279 

14 jZl0JL_8j Actin 


1 Acc. No. 

t_i 

w 

Y ! 

Interpro Domain Name 


14 j 

6f 

0 I 

1 irWJllin A O/tnvdif fattv-afiri hinriinn nmtain 

iar1 iPRnnn<;T7 


3| 

2 i 

f^n rhnhuHrato Winstca Pf^f^V famih^ 
v^aiuuiiyuiait) mikjoo, • oo i mrniiy 

J 


0; 

0 i 

PHornnrmnn/nonorAl tvif\tnni hitiritnn nmfain 
1 iitpiuiiiviiwyvnuvQj vuuiojh uinuiiiy pruiBin, 

PBP/GOBP family^ ( 

168| IPR000884 

14 1 

27 I 

0 j 

Thrombospondin type 1 domain * 

169| IPR001100 

14] 

5j 

3t 

Pyridine nucleotide-disulfide oxidoreductase, class 1 \ 

170J IPR001159 

14 j 

9 

2f 

Double-stranded RNA binding (DsRBD) domain 

171 J IPR001199 

14 j 

8! 

5l 

Cytochrome B5 j 

172.J JPR001357 

14 J. 

_.2lj 

---Vrl 

BRCT domain ! 

173| IPR001589 

14 J 8 

1 I 

Actinin-type actin-binding domain ; 

JL71J L P J1PPJ76. 3 


101 


.^n?y^9P^ n ydra|ase^son^erase J 

175J IPR001878 

14 | 

24 i 

9; 

Zn-finger CCHC type ! 

176J^IPR001952 

14| 


1 ! 

^Alkaline phosphajase fam[!y j 

177.J IPR002216 


17 ; 

... 1 J 

Ion transport protein 

178j| IPR002464 

1? J 

9\ 

.... 8 ,l 

DEAH-box subfamily ATP-dependent helicase j 

179.1 IPR000107 

13 '| 

8' 

3i 

SPRY domain 

1 80 J IPR000425 

13*| 

8 ! 

6 | 

MIP family 

181.|j IPR000508 

13 1 

..2i 

3 ; 

Signal peptidase 

182.J IPR000727 

13 1 

14 ; 

15 ! 

t-SNARE coiled-coil domain 

183.j| IPR0O0901 

13 i 

.... 6J 

7] 

Carbamoyl-phosphate synthase 

184.| IPR001461 


16 f 

7 j 

Pepsin (A1) aspartic protease family 

1 85.| IPR001506 

13 tl 

36 j 

0 1 

Astacin (Peptidase family M12A) family 

186.3 IPR001523 


11 ! 

0 t 

'Paired box* domain 

187j IPR001827 

13! 

2! 

0 ! 

'Homeobox* antennapedia-type protein 

188.J IPR001876 

13 I 

7 ; 

1 j 

Zn-finger in ranbp & others 

189.1 IPR002423 

J _ _ 


9 

TCP-1 (Tailless complex polypeptide)/cpn60 ! 
chaperonin family 

J90| IPRp02893_ 

'3. | 

8| 

1) 

MYND finger 

191.J I PR000461 

■ 12 | 

.«J 

8 

Alpha amylase 

..l?2jlP_R0p0798 

12 J 

_ 4, 

Oj 

Ezriryradixin/moesin family ; 

193.j IPR001023 

12 j| 

13, 

14 I 

Heat shock protein hsp70 

194.J IPR001508 

12 | 

1 , 

0' 

NMDA receptor 

195-1 IPR001683 

12 | 



PX (Bem1/NCF1/PI3K) domain 

196.|IPR001917 

r-4 

6 

4 

Aminotransferases ctass-ll 

197.f IPR001932 


9 

8 J 

Protein phosphatase 2C 

198.} IPR000050 


8 

. .PI 

Phosphotyrosine interaction domain (PID) 

199.j IPR000182 


9 

.... 9 1 

cejyltransfera^ « 

2pO-iJPRp00243 


2, 

7j 

.Proteasome i_B-type ;s^bun|t_ 


either Mdm2 or pl9ARF in Drosophila. In- 
terestingly, likely orthologs of the breast can- 
cer susceptibility genes BRCA1 and BRCA2 
were not found in Drosophila. In most in- 
stances, cancer genes that have a Drosophila 
ortholog also have an ortholog in C. elegans, 
although the extent of sequence similarity to 
the worm gene is lower. In a minority of 
instances, a C. elegans ortholog was clearly 
absent. Cancer genes with orthologs in Dro- 
sophila and apparently not in C elegans in- 
clude p53 and neurofibromatosis type I (22), 
the two genes implicated in tuberous sclerosis 
{TSCl and TSC2) (23), and MEN. The two 
TSC gene products are thought to bind to 
each other and may function in a pathway 
that is conserved between humans and Dro- 
sophila but is absent in C. elegans and S. 
cerevisiae. However, the limitations of this 
type of analysis are clearly illustrated by our 
inability to find a bCL2 ortholog in C. el- 
egans using these search parameters. The C 
elegans ced-9 gene has been shown to func- 
tion as a bCL2 homolog, and its protein is 
23% identical to the human protein over its 
entire length (24). 

Numerous orthologs of neurological 
genes are also found in the Drosophila ge- 


nome. Some, such as Notch (CADASIL syn- 
drome), the beta amyloid protein precursor- 
like gene, and Presenilin (Alzheimer's dis- 
ease), were already known from previous 
studies in the fly. The genome sequencing 
effort has uncovered several additional genes 
that are likely to be orthologs of human neu- 
rological genes, such as tau (frontotemporal 
dementia with Parkinsonism), the Best mac- 
ular dystrophy gene, neuroserpin (familial 
encephalopathy), genes for limb girdle mus- 
cular dystrophy types 2A and 2B, the Fried- 
reich ataxia gene, the gene for Miller-Dieker 
lissencephaly, parkin (juvenile Parkinson's 
disease), and the Tay-Sachs and Stargardf s 
disease genes. Several genes implicated in 
expanded polyglutamine repeat diseases, in- 
cluding Huntington's and spinal cerebellar 
ataxia 2 (SCA2\ are found in the fruit fly. 
Most human neurological disease genes sur- 
veyed were also detected in C. elegans, and 
some were even found in yeast, although a 
few examples are apparently present only in 
Drosophila, such as the Parkin and SCA2 
orthologs. 

Among genes implicated in endocrine dis- 
eases, those functioning in the insulin path- 
way are mostly conserved. Tn contrast, mem- 


bers of pathways involving growth hormone, 
rnineralocorticoids, thyroid hormone, and the 
proteins that regulate body mass in verte- 
brates, such as those encoding leptin, do not 
appear to have Drosophila orthologs. Sur- 
prisingly, a protein that shows significant 
sequence similarity to the luteinizing hor- 
mone receptor is present in Drosophila (25). 
The physiological ligand for this receptor is 
not known. A number of genes that have been 
implicated in human renal disorders have or- 
thologs in Drosophila, despite the differences 
between human kidneys and insect Mal- 
pighian tubules. In many instances, these 
gene products are involved in fluid and elec- 
trolyte transport across epithelia. Not surpris- 
ingly, most disease genes that function in 
intracellular metabolic pathways appear to 
have Drosophila orthologs. 

Developmental and Cellular Processes 

Developmental strategies in various phyla are 
overtly very different, from the fixed cell 
lineage of C. elegans to the syncytial embry- 
ogenic development of the fly, to early em- 
bryogenesis in amphibians and mammals. A 
number of major processes — cell division, 
cell shape, signaling pathways, cell-cell and 
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Cancer 

ABL1 

Acute Myeloid Leukemia-DEK 
Adenomal. Polyposis Coli-APC 
AKT2 

Ataxia Telangiectasta-ATM 

BRCA1 

BRCA2 

Basal Cell Nevus-PTC 
B-Ce!l Lymphoma 2-8CL2 
B-Cell Lymphoma 3-BCL3 
Bloom-BLM 

Burkio's Lymphoma-MYC 

CDKN2C 

CSF1R/C-Fms 

Chk2 Protein Kinase 

PDGF8 

CML-BCR 

Cydin D1-CCND1 

iCyclin Dep. Kinase 4-CDK4 

EGFR 

ERBB2 

ETS 

E-Cadherin-CDH1 
Ewing Sarcoma-FLM 
FGF3 

Fanconi's Anemia A-FANCA 

Fanconi's Anemia C-FANCC 

Fanconi's Anemia G-FANCG 

HNPCC-MSH2 

HNPCC-MSH3 

HNPCC-MSH6 

HNPCC-MLH1 

HNPCC*-PMS2 

KIT 

LCK 

Lymphoma-MCF2 

MA0H4 

MDM2 

MET 

MEN"'1 

MEN***2A-RET 

Multiple Exostosis 1-EXT1 

Multiple Exostosis 2-EXT2 

NTRK1 

Neurofibromatosis 1-NF1 
Neurofibromatosis 2-NF2 
Nijmegen Breakage 1-NBS1 
Nucleoporin-NUP214 
P16-INK4 
P16-INK4A 
P19ARF 
P53 
PTEN 
RAS 
REL 

Retinoblastoma-RB1 
STK11 

Stem Cell Leukemia-TAL1 
Tuberous Sclerosis 1-TSC1 
Tuberous Sclerosis 2-TSC2 
Von Hippel Lindau-VHL 
Wilm's Tumor-WT1 
Xeroderma Pigment. A-XPA 
Xeroderma Pigment. B-ERCC3 
Xeroderma Pigment. D-XPD 
Xeroderma Pigment. F-XPF 
Xeroderma Pigment. G-XPG 






Neurological 

Adrenoleukod ys trophy- ABCD1 

Atzheimer-PS1 

Alzheimer-APP 

Amyotrophic Lat. Sclero.-SOD1 

Angelman-UBE3A 

Aniridia-PAX6 

Best Macular Dystrophy-VMD2 
Ceroid-Ltpofuscinosts-PPT 
Ceroid-Lipofusctrtosts-CLN3 
Ceroid-Lipofu5Cinosis-CLN2 
Charcot-Marie-Tooth 1A-PMP22 
Charcot-Marie-Tooth 1B-MPZ 
Choroideremia-CHM 
CreutzfekJt-Jakob-PRNP 
Deafness. Hereditary-MY015 
Deafness, X-Linked-TIMM8A 
Diaphanous 1-DIAPH1 
Dementia. Mu1d-lnfarct-NOTCH3 
Duchenne MD*-DMD 
Emery- Dreifuss MD*-EMD 
Emery-Dreifuss MD'-LMNA 
Familial Encepha)opathy-PI12 
Fragile-X -FRAXA 
Friedreich Ataxia-FRDA 
Frontotemporal Dement. -TAU 
Fukuyama MD*-FCMD 
Huntington-HD 
Limb Girdle MD* 2A-CAPN3 
Limb Girdle MD* 2B-YSF 
Limb Girdle MD* 2E-BSG 
Lissencephaly. X-Linked-DCX 
Lowe Ocutocerebroren.-OCRL 
Machado-Joseph-MJD1 
Miller-Dieker Lissen.-PAF 
Myotonic Dystrophy-DM1 
Myotubular Myopathy 1-MTM1 
Naito-Oyang i-DRPLA 
Nemallne Myopathy 2-NEB 
Neuraminidase Defic.-NEU1 
Norrie-NDP 
Ocular Albinism-OA 1 
J Oculopharyngeal MD*-PABPN1 
| Oguchi Type 2-RH KIN 
Parkinson-SNCA 
I Parkinson-PARK2 
|parkinson-UCHL1 
Prog. Myoclonic Epifepsy-CSTB 
Retinitis Pigmentosa-RPGR 
Retinitis Pigmentosa 2-RP2 
SCA***1-SCA1 
SCA*~ 2-SCA2 
SCA~* 6-CACNA1 A 
SCA ++ * 7-SCA7 
Spinal Muscular Atrophy-SMN1 
Stargardt-ABCA4 
Tay-Sachs-HEXA 
Thomsen-CLCN1 
UsheMJSH2A 
Witson-ATP7B 

Cardiovascular 

AA/ Conduction Defects-CSX 
HDL Deficiency 1-ABCA1 
Long Q-T 1-KCNQ1 
Long Q-T 2-KCNH2 
Long Q-T 3-SCN5A 
Fam. Cardiac Myopathy-MYH7 
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Malformation Syndromes 

Aarskog-Scoti-FG D1 
Achondroplasia-FGFR3 
Alagille-JAG1 
Bartn-TAZ 

Beckwith-Wiedemanrt-CDKN 1 C 
Cerebral Cavern. Malf.-CCM1 
Chondrodyspl. Punct. 1-ARSE 
Cleidocranial Dysplasia-OFCI 
Cockayne I-CKN1 
Coffin-Lowry-RPS6KA3 
Diastrophic Dyspl.-SLC26A2 
EEC 3-Ket. P63 
Greig Cephalopolysynd.-GLI3 
Hand-Foot-Genital-H0XA1 3 
Holoprosencephaly 3-SHH 
Hotoprosencephaly-SIX3 
Holt-Oram-TBX5 
ICF-DNMT3B 
Kallman-KAL1 
Laterality. X-Unked-ZIC3 
Melnick-Fraser-EYA1 
Nail Patena-LMXiB 
Opitz-MIDl 

Renal Coloboma-PAX2 
Rieger. Typel-PITX2 
Rubinstein-Taybi-CREBBP 
Saethre-C hotzen-TWI ST 
Septooptic Dysplasia-HESX1 
Simpson-Golabi-8ehmel-GPC3 
i Townes-8rockes-SALL1 
Treacher-Collins-TCOF1 
VMCM-TEK 
Wardenburg-PAX3 
Zellweger-PEXI 

Endocrine 

Adrenal Hypoplasia-NRQBI 
Androgen Receptor-AR 
Adrenal Hyperplas. I1I-CYP21A2 
Diabetas-INS 
;Diabeles-INSR 
Diabet. Ins. Neurohypop.-AVP 
Diabet. w/ Hypertens.-PPARG 
Dwarfism-GHI 
DwarfisnvGHR 
Gonadal Dysgenesis-SRY 
Hyperinsulinism-ABCC8 
Hyperinsulinism -KCNJ11 
Hypothyroidism-TRH 
Hypothyroidism-SLC5A5 
Leydig Cell Hypoplasia-LHCGR 
MODY~1-HNF-4A 
MODY" 2-GCK 
M0DY'*3-TCF1 
MODY**4-IPF1 
MODY** 5-TCF2 
McCune-Albright-GNAS1 
Non-Insulin Dep. Diabel.-PCSK1 
Obesity- LEP 
Obesiry-LEPR 
Obesity-MC4R 
Obesity-POMC 
Pendred-PDS 
Thyr. Resistance-THRA 
Thyr. Resistance-THRB 
Thyrotropin Deficiency-TSHB 
Vitamin-D Resis. Rickets-VDR 


Fig. 1. 


cell-substrate adhesion, and apoptosis — de- 
termine the developmental outcomes of these 
very different embryos. Although there are 
many more, such as the processes that deter- 
mine embryonic gradients, cell polarities, and 
cell movement, here we examine the first 
five, beginning with cell cycle components, 
and examine what new insights have been 
gained from the genomic data that affect our 
knowledge of the evolution of developmental 
processes. We then discuss the processes of 
neuronal signaling and innate immunity. 

Cell cycle. Despite conservation of the 
mechanisms regulating cell cycle progres- 
sion, many of the functions governing this 
progression are encoded by gene families 
whose individual members are not conserved 
between vertebrates and yeast. For example, 
the cyclins of S. cerevisiae can be divided 
into a class (Clnl, Cln2, and Cln3) and an 
S/G 2 class (Clbl through Clb6); it is not 
possible to identify orthologs of individual 
vertebrate cyclins. Consequently, analysis of 
the roles of particular vertebrate cell cycle 
genes benefits from a genetic model in which 
parallels are more evident. Analysis of the 
Drosophila genome sequence supports and 
extends previous suggestions of strong paral- 
lels between fly and human cell cycle regu- 
lators. Orthologs of vertebrate cell cycle cy- 
clins — cyclin A (CycA), CycB, CycB3, 
CycE, and CycD — have been identified in 
Drosophila, as have orthologs of cyclins that 
appear to have roles in transcription: CycC, 
CycH, CycK, and CycT. Apparent orthologs 
of these cyclins can be also be found in C. 
elegans; however, the level of similarity to 
the vertebrate members is invariably substan- 
tially less. Indeed, BLAST comparisons sug- 
gest that vertebrate and Drosophila CycA and 
CycB share more sequence similarity with 
yeast than with proposed C. elegans or- 
thologs. Examination of other cell cycle reg- 
ulators confirms that quite precise compari- 
sons can be made between vertebrates and 
flies; parallels with yeast are looser. For ex- 
ample, like vertebrates, Drosophila uses sev- 
eral different eye 1 in-dependent kinases 
(Cdks) to regulate different aspects of the cell 
cycle; S. cerevisiae and Schizosaccharomyces 
pombe use only one. Cloning efforts and the 
genome sequence revealed Drosophila or- 
thologs of vertebrate Cdkl (cdc2) and Cdk2 
(cdc2c), as well as a single Drosophila Cdk 
(Cdk4/6) with close similarity to both Cdk4 and 
Cdk6. As in vertebrates, Drosophila has two 
distinct kinases that add inhibitory phos- 
phate to Cdkl, the previously identified 
Wee, and a recently recognized homolog of 
Mytl, which was initially identified as a 
membrane-associated inhibitory kinase in Xen- 
opus (26). C elegans also has two homo logs 
of these kinases (Wee 1.1 and Wee 1.3); 
however, similarity scores do not place 
these into distinct Weel and Mytl sub- 


types. Each of these genes appears to be 
present in a single copy, a factor that sim- 
plifies genetic interpretations. 

The retinoblastoma gene product pRb is a 
crucial cell cycle regulator in mammals and is 
thought to modulate S-phase entry via its 
interactions with the transcriptional regulator 
E2F and its dimerization partner (DP). This 
important mode of regulation is not found in 
yeast, but many components of the Rb path- 
way have been identified and studied in Dro- 
sophila (27). The sequencing effort uncov- 
ered a second /?£-related gene in Drosophila 
and confirmed the existence of only two E2F 
family members and a single DP ortholog. C 
elegans also has an #/>related gene, isolated 
in a genetic screen for mutations affecting 
cell fate decisions {28), but it has not been 
shown to play a direct role in cell cycle 


regulation. Also evident from the sequence 
are eight skp-Mkt genes and six cw//m-related 
genes. The Skp and Cullin proteins function 
in a complex that mediates the degradation of 
specific target proteins during crucial cell 
cycle transitions. Further exploration of the 
genome sequence should define orthologs to 
most vertebrate cell cycle genes and lead to 
genetic tests of their regulation and function. 

Cytoskeleton. A large number of proteins 
link events at the cell surface with cytoskel- 
etal networks and intracellular messengers 
(13). We found approximately 230 genes (ap- 
proximately 2% of the predicted genes) that 
encode cytoskeletal structural or motor pro- 
teins; these represent most major families 
found in other invertebrates and vertebrates 
(29). The fraction of the Drosophila genome 
devoted to cytoskeletal functions appears to 
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AJport-COL4A5 
|Bartter-SLC12A1 

Congenital Nephrotic-NPHSI 
iDenl-CLCNS 
I Diabetes Insipidus 2-AQP2 
|Gitelman-SLC12A3 
• Hyperoxaluria l-AGXT 
1° Hypomagne$emia-CLDN16 
| Hypophosphatas'ta-ALPL 
Nephronophthisis 1-NPHP1 
Polycystic Kidney 1-PKD1 
Polycystic Kidney 2-PKD2 
Pseudohypoaldoster-NR3C2 
| Renal Tubul. Actdosis-ATP6B1 
Vitamin D Rests. Rickets-PHEX 
Williams-Beuren-ELN 


Hematological 

Chediak-Higasht-CHS1 

Diamond-Blackfan Anem.-RPS19 

Essen.Thrombocythemia-TH PO 

G6PD Deficiency-G6PD 

HPLH2-PRF1 

Hemophilia A-F8C 

Hemophilia B-F9 

Hered. Spherocytosis-ANK1 

Megaloblas. Anemia-SLC19A2 

Myeloperoxidase Defic.-MPO 

Osler-Rendu-Weber-ENG 

«-Thalassemia-HBA1 

P-Thalassemia-HBB 

6-Thaiassemia-HBD 

c-Thalassemia-HBE 

Thrombophilia-PLG 

Von Willebrand-VWF 

Wiskott-AkJrich-WAS 



F W 





Immune 

Bare Lymphocyte-ABCB3 

Bare Lymphocyte-RFX5 

Bare Lymphocyte-RFXSAP 

Bare Lymphocyte-MHC2TA 

Brulon Agammaglobulin.-BTK 

Chronic Granulom.-NCF1 

Chronic Granutom.-CYB8 

Immunodeficieney-DNA Ligase 1 

lmmunodeficiency-CD3G 

SCID--IL2RG 

SCID"-IL7R 

SCID'*-JAK3 

SCID"-RAG1 

SCID"-RAG2 

SCID"-ZAP70 

T-Cell lmmunodefic.-CD3E 

X-Linked Lymphoprol.-SH2D1A 

Metabolic 

CPT2 Deficiency-CPT2 

1» Carnitine Defic.-SLC22A5 

Citrullinemia, Type l-ASS 

Cystinuria, Type 1-SLC3A1 

Hypercalcemia-CASR 

Galactokinase-GALK1 

Gaucher-GBA 

Hemochromatosis-H FE 

Lesch-Nyhan-HPRT1 

Liddle-SCNNIG 

Liddle-SCNNIB 

Menkes-ATP7A 

Niemann-Pick C-NPC1 

SCID"-ADA 

Trimethylaminuria-FM03 

Variegate Porphyria-PPOX 

Wemicke-Korsakoff-TKT 


F W 




Other 

a-1 -Antitrypsin Deficency-PI 
Alveolar Proteinosis-SFTPB 
Corneal Dystrophy-TGFBI 
Cystic Fibrosts-ABCC7 
CystinosiS-CTNS 
Darier-White-SERCA 
Downreg. in Adenoma-DRA 
Ehlers-Oanlos IV-COL3A1 
Fam. Mediterr. Fever-MEFV 
Finnish Amytoidosis-GSN 
Glycerol Kinase Defic.-GK 
Hereditary Pancreatitis-PRSSI 
Hermansky-Pudlak-HPS 
Hyperexplexia-GLRA2 
Juvenile Glaucoma-GLC1A 
Keratoderma-KRT9 
Marfan-FBN1 
Mcleod-XK 
Monilethrix-KRTHB 
Monilethrix-KRTHB6 
Osteogenesis Imperf.-COH A1 
Spondyloepip. 0ysp.-COL2A1 
Vohwinkel-LOR 
WolfranvWFSI 


Fig. 1 (continued). Fly (F), worm (W), and yeast (Y) genes showing similarity to human disease 
genes. This collection of human disease genes was selected to represent a cross section of human 
pathophysiology and is not comprehensive. The selection criteria require that the gene is actually 
mutated, altered, amplified, or deleted in a human disease, as opposed to having a function 
deduced from experiments on model organisms or in cell culture. Due to redundancy in gene and 
protein sequence databases, a single reference sequence for each gene had to be chosen. Most 
reference sequences represent the longest mRNA of several alternatives in CenBank. Authoritative 
sources in the literature and electronic databases [Online Mendelian Inheritance in Man (OMIM)] 
were also consulted. In all, 289 protein sequences met these criteria. These were used as queries 
to search a database consisting of the sum total of gene products (38,860) found in the complete 
genomes of fly, worm, and yeast. 12,953 was used as the effective database size (the z parameter 
in BLAST ). BLASTP searches were conducted as described for full genome searches, except for the 
z parameter. To control for potential frameshift errors in the Drosophila genome sequence, searches 
against a six-frame translation of the entire genome (using TBLASTN) were also conducted with the 
disease gene sequences using the z parameter above. Only two cases in which matches to genomic 
sequence were better than to the predicted protein were found, and these were manually corrected 
to reflect the better TBLASTN scores in the table. Results are scaled according to various levels of 
statistical significance, reflecting a level of confidence in either evolutionary homology or func- 
tional similarity. White boxes represent BLAST E values >1 X 10~ 6 , indicating no or weak 
similarity; light blue boxes represent E values in the range of 1 X 10~ 6 to 1 x TO" 40 ; purple boxes 
represent E values in the range of 1 x 10~ 40 to 1 X 10~ 100 ; and dark blue boxes represent E values 
<1 X 10~ 100 , indicating the highest degree of sequence conservation. Actual E values can be found 
in the Web supplement to this figure (62), where links to OMIM and CenBank may also be found. 
A plus sign indicates our best estimate that the corresponding Drosophila gene product is the 
functional equivalent of the human protein, based on degree of sequence similarity, InterPro 
domain composition, and supporting biological evidence, when available. A minus sign indicates 
that we were unable to identify a likely functional equivalent of the human protein. 


be somewhat smaller than that found in C 
elegctns (5%) (30); whether this reflects a true 
biological difference or a difference in classifi- 
cation criteria remains to be discovered. Of the 
Drosophila cytoskeletal genes, 90 encode pro- 
teins belonging to the kinesin, dynein, or myo- 
sin motor superfamilies, or accessory or regu- 
latory proteins known to interact with the motor 
protein subunits. Approximately 80 genes en- 
code actin-binding proteins, including proteins 
belonging to the spectrin/a-actinin/dystrophin 
superfamily of membrane cytoskeletal and ac- 
tin-cross-1 inking proteins. Twenty genes en- 
code proteins that are likely to bind microtu- 
bules, based on their similarity to microtubule- 
binding proteins found in other organisms. 


Fourteen genes encode members of the actin 
superfamily, 1 2 encode members of the tubulin 
superfamily, and 5 encode septins. Overall, the 
representation of predicted cytoskeletal protein 
types and families is similar to what has been 
found for C elegans, although Drosophila has 
many more dyneins, probably because C el- 
egans lacks motile cilia and flagella. 

Among this collection of cytoskeletal genes 
are several interesting and in some cases long- 
sought genes. One gene encodes a protein with 
striking homology to proteins of the tau/M A P2/ 
MAP4 family that share a characteristic repeat- 
ed microtubule-binding domain. Two encode 
new tubulins; one appears most closely related 
to a-tubulin, and the other appears most closely 


related to ^-tubulin, both with approximately 
50% identity. Neither new tubulin has greater 
similarity to the other, more divergent members 
of the tubulin superfamily. such as 7-, 8-, or 
e-tubulin (31). Thus, both Drosophila and C. 
elegans appear to lack 5- and e-rubulin, even 
though 5-tubulin is highly conserved between 
Chlamydomonas and humans. There are also 
three new members of the central motor domain 
family of kinesins that encode nonmotor pro- 
teins that regulate microtubule dynamics (32). 
There are clear homologs of the dystrophin 
complex and of dystrobrevin. Finally, the fly 
lacks cytoplasmic intermediate filament pro- 
teins, other than nuclear lamins, although other 
invertebrates, including C. elegans, appear to 
have genes encoding these (33). Drosophila 
and C. elegans both also appear to lack a gene 
encoding kinectin, the proposed receptor for 
kinesin and cytoplasmic dynein on vesicles and 
organelles (34). Flies and worms must thus use 
different proteins to link microtubule motors to 
vesicles and organelles. 

Cell adhesion. Cell -cell adhesion and cell- 
substrate adhesion molecules have been crucial 
to the development of multicellular organisms 
and the evolution of complex forms of embry- 
ogenesis (13). The transmembrane extracellular 
matrix-cytoskeleton linkage via integrins is an- 
cient. There are five a and two p integrins in 
the fly, two a and one p in C. elegans, and at 
least 1 8 a and eight £ in vertebrates. Integrin- 
associated cytoplasmic proteins (talin, vinculin, 
a-actinin, paxillin, FAK, pl30CAS, and ILK) 
are encoded by single-copy fly genes, as are 
tensin and syndecan. 

Two genes for type IV collagen subunits 
and genes for the three subunits of laminin 
were already known in the fly. Analysis of 
the genome revealed no more laminin genes 
and only one more collagen, which is closest 
to types XV and XVIII of vertebrates. A 
counterpart of this collagen is found in C. 
elegans, which has on the order of 1 70 col- 
lagens. Most important, it appears that the 
core components of basement membranes 
(two type IV collagen subunits, three laminin 
subunits, entactin/nidogen, and one perle- 
can), are all present in flies. This constitution 
of basement membranes was clearly estab- 
lished early in evolution and has been well 
conserved in metazoans; remarkably, the fly 
preserves the linked head-to-head organiza- 
tion of vertebrate type-IV collagen genes. In 
contrast to this conservation, many well- 
known vertebrate integrin (ECM) ligands are 
absent from the fly: fibronectin, vitronectin, 
elastin, von Willebrand factor, osteopontin, 
and fibrillar collagens are all missing. 

The fly has three classic cadherins. two of 
which are closely linked, but no protocadherins 
of the type found in vertebrates as clusters with 
common cytoplasmic domains (35). Verte- 
brates have three such clusters encoding over 
50 protocadherins and close to 20 classical 
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cadherins. The fly has no reelin, an ECM ligand 
for CNR-type protocadherins in vertebrates 
(36). However, there are other fly proteins with 
cadherin repeats, including the previously 
known Fat, Dachsous, and Starry night, and a 
new very large protein related to Fat C. elegans 
has 15 genes containing cadherin repeats; the 
number in humans is now 70 and will undoubt- 
edly rise {13). 

Cell signaling. Components of known sig- 
naling pathways in the fly and worm have 
largely been uncovered by examinations of de- 
velopmental systems. It is a tribute to the pre- 
vious genetic analyses done in these organisms 
that only a modest number of new components 
of the known signaling pathways were revealed 
by analysis of the genomic sequence. The core 
components defined in flies and worms have 
been used in modified and expanded forms in 
vertebrates (37). The predominant pathways — 
transforming growth factor-p (TGF-p), recep- 
tor tyrosine kinases, Wingless/Wnt, Notch/lin- 
12, Toll/ILl, JAK/STAT/cytokine, and Hedge- 
hog (HH) signaling networks — all have largely 
conserved fly and vertebrate components. The 
worm, by contrast, does not appear to possess 
the HH or Toll/ILl pathways, nor does it have 
all of the components of the Notch/lin-12 net- 
work (38). Two new proteins of the TGF-p 
superfamily were identified, bringing the total 
to seven; all seven are members of the bone 
morphogenetic protein (BMP) or p-activin sub- 
families. We detected no representatives of the 
other branches of this superfamily, namely the 
TGF-p, a-inhibin, and Mullerian inhibiting 
substance (MIS) subfamilies. Three new mem- 
bers of the Wingless/Wnt family were identi- 
fied, bringing the total to seven. Each of these 


proteins has sequence similarity to a differ- 
ent vertebrate Wnt protein; this ancient 
family clearly underwent much of its ex- 
pansion before the divergence of the arthro- 
pod and chordate lineages. There is only 
one member of the Notch and HH families, 
in contrast to the many members of these 
families in vertebrates. 

Apoptosis. The core apoptotic machinery of 
Drosophila shares many features in common 
with that of mammals. Many apoptosis-induc- 
ing signals lead to activation of members of the 
caspase family of proteases. These proteases 
function in apoptotic processes as cell death 
signal transducers and death effectors, and in 
nonapoptotic processes in flies and mammals 
(39). Drosophila contains genes encoding 8 
caspases, as compared to 4 in the worm and at 
least 14 in mammals. Three of the fly caspases 
contain long NH 2 -terminal prodomains of 1 00 
to 200 amino acids that are characteristic of 
caspases that function as signal transducers. 
These prodomains are thought to mediate 
caspase recruitment into signaling complexes in 
which activation occurs in response to oli- 
gomerization. In one pathway described in 
mammals but not in worms, death signals cause 
the release of proteins, including cytochrome c 
and the apoptosis-inducing factor (AIF), from 
mitochondria (40). The human protein Apaf-1, 
in conjunction with cytochrome c, activates 
CARD domain-containing caspases (41). Dro- 
sophila has an Apaf-1 counterpart, a CARD 
domain-containing caspase, and AIF; Dro- 
sophila also has counterparts to the caspase- 
activated DNAse C AD/CP AN/DFF40, its in- 
hibitor ICAD/DFF45, and the chromatin con- 
densation factor Acinus (42). 


Pro- and anti-apoptotic BCL2 family 
members regulate apoptosis at multiple 
points (43), Drosophila encodes two BCL2 
family proteins, though more divergent fam- 
ily members may exist. Fifteen BCL2 family 
proteins have been identified in mammals 
and two in the worm. In addition, inhibitor of 
apoptosis (IAP) family proteins negatively 
regulate apoptosis (44). They are defined by 
the presence of one or more NH 2 -terminal 
repeats of a BIR domain, a motif that is 
essential for death inhibition. Drosophila has 
four proteins with this motif, as compared to 
seven identified thus far in mammals. There 
are several BIR domain-containing proteins 
in C elegans and yeast, but none has been 
implicated in cell death regulation. Reaper 
(RPR), Wrinkled (W), and Grim are essential 
Drosophila cell death activators (45). Or- 
thologs have not been identified in other or- 
ganisms, but they are likely to exist because 
RPR, W, and Grim induce apoptosis in ver- 
tebrate systems and physically interact with 
apoptosis regulators that include lAPs and the 
Xenopus protein Scythe (46), for which there 
is a predicted Drosophila homolog. 

Neuronal signaling. The neuronal signaling 
systems in flies, worms, and vertebrates reveal 
extensive conservation of some components, as 
well as extreme divergence, or the total ab- 
sence, of others. There is no voltage-activated 
sodium channel in the worm (17); flies and 
vertebrates generate sodium-dependent action 
potentials. The fly genome encodes two pore- 
forming subunits for sodium channels (Para and 
NaCP60E), and also four voltage-dependent 
calcium channel a subunits, including one 
T-type/alG, one L-type/alD (DmcalD), one 
N-type/al A (DmcalA), and one protein that is 
more similar to an outlying C elegans protein 
than to known vertebrate calcium channels. Ad- 
ditional fly calcium channel subunits include 
one p, one -y 2, and three a 2 subunits. 

The worm genome encodes over 80 potas- 
sium channel proteins (/7); the fly genome has 
only 30. The extent to which these different 
family sizes contribute to the establishment of 
unique electrical signatures is unknown. The fly 
potassium channel family includes five Shaker- 
like genes (Shaker, Shab, Shal, and two Shaws); 
a large conductance calcium-activated channel 
gene (slowpoke); a slack subunit relative; three 
members of the eag family (eag, sei, and elk); 
one small conductance calcium-regulated chan- 
nel gene; one KCNQ channel gene; and four 
cyclic nucleotide-gated channel genes. In ad- 
dition, there are 50 TWIK members in the 
worm, but only 1 1 fly members of the two- 
pore/TWIK family with four transmembrane 
domains. There are also three fly members of 
the inward rectifier/two transmembrane family. 
Finally, neither the fly nor the worm has dis- 
cernible relatives of a number of mammalian 
channel-associated subunits such as minK and 
miRPl. 


Table 4. The 10 InterPro protein domains occurring in the largest number of different proteins in S. 
cerevisiae and C. elegans. 


Acc no. 


InterPro domain name 


No. of 
proteins 


IPR000719 
IPR001680 
IPR001650 
IPR001138 
IPR001042 
IPR00O504 
IPR001410 
IPR000822 
IPR001066 
IPR001969 

IPR000168 

IPR000694 
IPR000719 
IPR002356 
IPR001628 
1PR001810 
IPR000087 
IPR001304 
IPR002900 
IPR000822 


5. cerevisiae 
Eukaryotic protein kinase 
G-protein beta WD-40 repeats 
DNA/RNA helicase domain (DEAD/DEAH box) 
Fungal transcriptional regulatory protein, N-terminus 
TYA transposon protein 

RNA-binding region RNP-1 (RNA recognition motif) 
DEAD/DEAH box helicase 
Zinc finger, C2H2 type 
Sugar transporter 

Eukaryotic and viral aspartyl proteases active site 
C. elegans 

7-Helix G-protein coupled receptor, nematode 

(probably olfactory) family 
Proline-rich region 
Eukaryotic protein kinase 
G-protein-coupled receptors, rhodopsin family 
C4-type steroid receptor zinc finger 
F-box domain 

Collagen triple helix repeat 
C-type lectin domain 
Domain of unknown function 
Zinc finger, C2H2 type 


119 
90 
75 
60 
57 
55 
48 
47 
46 
42 

545 

398 
388 
335 
224 
215 
166 
165 
142 
138 
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There are also major differences postsynap- 
tically. C elegans has approximately 100 mem- 
bers of a family of ligand-gated ion channels 
(77); flies have about 50. The worm has 42 
nicotinic acetylcholine receptor subunits and 37 
GABA(A)-like receptor subunits; the fly con- 
tains only 1 1 nicotinic receptor subunit genes 
and 12 GABA(A)/glycine-like receptor subunit 
genes. In contrast, there are 30 members of the 
excitatory glutamate receptor family in the fly 
but only 10 in the worm. These include sub- 
types of the AMPA, kainate, NMD A, and delta 
families. In addition, the fly genome contains a 
large number of PDZ-containing genes, ap- 
proximately a dozen of which encode proteins 
that have high sequence similarity to mamma- 
lian proteins that interact with specific subsets 
of ion channels. We also found a number of 
additional ion channel families, including three 
voltage-dependent chloride channels, 14 Trp- 
like channels, 24 amiloride-sensitive/degenerin- 
like sodium channels, one ryanodine receptor, 
one IP 3 (inositol 1,4,5-trisphosphate) receptor, 
eight innexins, and two porins. C. elegans is 
missing a nitric oxide synthase gene, copies of 
which occur in fly and vertebrate genomes. 

A large array of proteins mediates specific 
aspects of synaptic vesicle trafficking and con- 
tributes to the conversion of electrical signals to 
neurotransmitter release. These components of 
exocytosis and endocytosis are relatively well 
conserved with respect to both domain struc- 
tures and amino acid identities (50 to 90%). The 
fly has enzymes for the synthesis of the neuro- 
transmitters glutamate, dopamine, serotonin, 
histamine, GABA, acetylcholine, and octopam- 
ine, and a family of conserved transporters is 
likely to be involved in loading vesicles with 
these neurotransmitters. The conserved vesicu- 
lar trafficking proteins, with 50 to 80% amino 
acid identity, include members of the Munc-18, 
SCAMP, synaptogyrin, HRS2, tomosyn, cys- 
teine string protein, exocyst (SEC 5, 6, 7, 8, 10, 
13, 15, EXO 70, and EX084), synapsin, rab- 
philin-3A, RIM, rab-3, CAPS, Mint, Munc-13, 
NSF, a and 7 SNAP, DOC-2B, latrophilin, 
Veli, CASK, VAP-33, Snapin, SV2, and com- 
plexin families. Generally, there is only one 
homolog in Drosophila for every three to 
four isoforms in mammals. However, there are 
eight fly synaptotagmin-like genes, making this 
the largest family of vesicle proteins in Dro- 
sophila (47). However, there is no homolog of 
synaptophysin, an early candidate for a vesicle 
fusion pore, which indicates a nonessential role 
in exocytosis for this particular protein across 
phyla. 

Membrane trafficking also requires inter- 
actions between compartment-specific vesic- 
ular and target membrane proteins (v- 
SNAREs and t-SNAREs, respectively), 
whose subcellular distribution and combina- 
torial binding patterns are predicted to define 
organelle identity and targeting specificity 
(48). The completed fly genome allows us to 
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address whether there is any correlation be- 
tween the increased developmental complex- 
ity of multicellular organisms and a larger 
number of SNAREs than that found in uni- 
cellular organisms. In the fly, we find six 
synaptobrevins, three SNAP-25s, 10 syntax- 
ins, and four additional t-SNAREs (membrin, 
BET1, UFE1, and GOS28), and the number 
of SNAREs is similar between yeast (49) and 
Drosophila. Thus, basic subcellular compart- 
mentalization and membrane trafficking to 
and between these various compartments has 
not changed dramatically in multicellular ver- 
sus unicellular organisms. Dynamin, clathrin, 
the clathrin adapter proteins, amphiphysin, 
synaptojanin, and a number of additional 
genes that encode proteins with defined en- 
docytotic motifs are all present. 

In contrast to the conservation of the syn- 
aptic vesicle trafficking machinery, the few 
identified proteins present at mammalian ac- 
tive zones, namely aczonin, bassoon, and pic- 
colo, do not have relatives in Drosophila. 
There are, however, numerous proteins in the 
fly with combinations of C2 domains, PDZ 
domains, zinc fingers, and proline-rich do- 
mains, indicating that the precise protein 
composition of active zones is likely to vary 
among metazoans. In addition, Drosophila 
contains a neurexin III gene and four neuroli- 
gin genes that may be part of a neurexin- 


neuroligin complex that has been widely pro- 
posed to provide a synaptic scaffold for link- 
ing pre- and postsynaptic staictures in mam- 
mals (50). Potential agrin and Musk genes are 
also present, though the overall sequence 
similarity is low. 

Immunity. Multicellular organisms have 
elaborate systems to defend against microbial 
pathogens. Only vertebrates have an acquired 
immune system, but both vertebrates and in- 
vertebrates share a more primitive innate im- 
mune system. Innate immunity is based on 
the detection of common microbial molecules 
such as lipopolysaccharides and peptidogly- 
cans by a class of receptors known as pattern 
recognition receptors (57). We identified a 
large family of genes encoding hornologs of 
receptors that are involved in microbial rec- 
ognition in other organisms. These include 
two new hornologs of the Drosophila Scav- 
enger Receptors (dSR-CI), nine members of 
the CD36 family, 1 1 members of the pepti- 
doglycan recognition protein (PGRP) family, 
three Gram-negative binding protein (GNBP) 
hornologs, and several lectins (52). 

The recognition of infection by immuno- 
responsive tissues induces a battery of de- 
fense genes via Toll/nuclear factor kappa B 
(NF-kB) pathways in both Drosophila and 
mammals (53). The Toll receptor was initial- 
ly discovered as an essential component of 


Table 5. Proteins in D. metanogaster, C elegans, and S. cerevisiae with more than one Inter Pro domain. 
These numbers represent the total number of recognizable domains within a single protein, no matter 
whether they are multiple copies of the same domain or different domains. 


InterPro domains per 
protein 


D. metanogaster 
(number of proteins) 


C. elegans 
(number of proteins) 


5. cerevisiae 
(number of proteins) 


2 

3 

4 

5 

6 

7 

8 

9 
10 
11-15 
16-20 
21-30 
31-50 
51-75 


920 
388 
219 
163 
101 
92 
58 
42 
22 
73 
18 
22 
8 
4 


1236 
458 
182 
98 
72 
53 
27 
25 
18 
43 
17 
22 
5 
5 


410 
121 
58 
26 
17 
15 
7 
4 
7 
6 
1 
0 
0 
0 


Table 6. Proteins in D. metanogaster, C. elegans, and S. cerevisiae with multiple different InterPro 
domains. Individual InterPro domains are counted only once per protein, regardless of how many times 
they occur in that protein. 


Unique InterPro 
domains per 
protein 

D. metanogaster 
(number of proteins) 

C elegans 
(number of proteins) 

S. cerevisiae 
(number of proteins) 

2 

1474 

1248 

402 

3 

413 

335 

95 

4 

156 

114 

23 

5 

52 

38 

4 

6 

8 

9 

1 

7 or more 

4 

3 

0 
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the pathway that establishes the dorsoventral 
axis of the Drosophila embryo. Recent genetic 
studies now reveal that Toll signaling pathways 
are key mediators of immune responses to fungi 
and bacteria in both Drosophila and mice (53). 
We found seven additional homologs of Toll 
proteins in Drosophila, all of which are more 
similar to each other than to their mammalian 
counterparts. Some of these other Toll proteins, 
like 18-wheeler, will probably mediate innate 
immune responses. In Drosophila, infection by 
at least some microbes induces a proteolytic 
cascade that leads to the processing of Spaetzle 
(SPZ), a cytokine-like protein, which then acti- 
vates Toll (53). We found two proteins related 
to SPZ with similarities that include most or all 
of the cysteine residues of SPZ. Given the 
presence of multiple Toll-like receptors in Dro- 
sophila, these new SPZ-like proteins may also 
function in the immune system. With the ex- 
ception of the two I-kB kinase homologs and 
the three rel proteins (Dorsal, Dif, and Relish), 
the Drosophila genome appears to contain only 
single copies of the genes encoding intracellular 
components of the Toll pathway: Tube, Pelle, 
and Cactus. How do the different Toll receptors 
trigger specific immune responses using the 
same intracellular intennediates? One explana- 
tion is that additional signaling components 
remain unidentified; another explanation is 
crosstalk with other signaling pathways. In con- 
trast, a Toll ortholog has not been identified in 
C. elegans, although there are some Toll-like 
receptors. C. elegans, in addition, does not pos- 
sess homologs of NF-KB/dorsal transcriptional 
activators that function downstream of Toll. 
Although it is probable that the worm has re- 
tained parts of the innate immunity network, 
there is no clear evidence of an inducible host 
defense system in the worm. 

One of the most potent innate immune 
responses in insects is the transcriptional in- 
duction of genes encoding antimicrobial pep- 
tides (53). In contrast to Metchnikowin, 
Drosocin, and Defensin peptides, which are 
encoded by single genes, the sequence data 
indicate that, like the previously identifed 
cecropin clusters, several antimicrobial pep- 
tides are encoded by gene families that are 
larger than previously suspected. Four genes 
appear to encode antifungal peptide Droso- 
mycin iso forms, and two genes each code for 
the antibacterial proteins Attacin and Dipteri- 
cin. These additional genes may generate 
peptides with slightly different spectra of an- 
timicrobial activity or may simply amplify 
the antimicrobial response. 

Concluding Remarks 

What have we learned about the proteins 
encoded by the three sequenced eukaryotic 
genomes? Some information emerges readily 
from the comparison of the fly, worm, and 
yeast genomes. First, the core proteome sizes 
of flies and worms are similar and are only 


twice the size of that of yeast. This is perhaps 
counterintuitive, because the fly, a multicel- 
lular animal with specialized cell types, com- 
plex development, and a sophisticated ner- 
vous system, looks more than twice as com- 
plicated as single-celled yeast. The lesson is 
that the complexity apparent in the metazoans 
is not achieved by sheer number of genes 
(54). Second, there has been a proliferation 
of bigger and more complex proteins in the 
two metazoans relative to yeast, including, 
not surprisingly, more proteins with extracel- 
lular domains involved in cell-cell and cell- 
substrate interactions. Finally, the population 
of multidomain proteins is somewhat larger 
and more diverse in the fly than in the worm. 
There is presently no practical way to quan- 
tify differences in biological complexity be- 
tween two organisms, however, so it is not 
possible to correlate this increased domain 
expansion and diversity in the fly with differ- 
ences in development and morphology. 

The availability of the annotated sequence 
of the Drosophila genome enhances the fly's 
usefulness as an experimental organism. By 
greatly facilitating positional cloning, the ge- 
nome sequence will increase the efficiency of 
genetic screens that seek to identify genes 
underlying many complex processes of cell 
biology, development, and behavior. Such 
screens have been the mainstay of Drosoph- 
ila research and have contributed enormously 
to our knowledge of metazoan biology. The 
genome sequencing effort has revealed a 
number of previously unknown counterparts 
to human genes involved in cancer and neu- 
rological disorders; for example, p53, men in, 
tau, limb girdle muscular dystrophy Type 2B, 
Friedrich ataxia, and parkin. All of these fly 
genes are present in a single copy in the 
genome and can be genetically analyzed 
without uncertainty about redundant copies. 
More genetic screens are important in order 
to uncover interacting network members. Or- 
thologs of these network members can then 
be sought in the human genome to determine 
if alterations in any of them predispose hu- 
mans to the disease in question, an experi- 
mental paradigm that has already been suc- 
cessfully executed in several cases. Flies can 
also play an important role in exploring ways 
to rectify disease phenotypes. For example, at 
least 10 human neurodegenerative diseases 
are caused by expansion of polyglutamine 
repeats (55). Human proteins containing ex- 
panded polyglutamine repeats have been ex- 
pressed in flies, resulting in the formation of 
nuclear inclusions that contain the protein as 
well as other shared components (56), just as 
in humans. It has been shown that directed 
expression of the human HSP70 chaperone in 
the fly can totally suppress neurodegenera- 
tion resulting from expression of the human 
spinocerebellar ataxia type 3 protein (57). 
The power and speed of this in vivo system 


are unparalleled, and we anticipate the in- 
creased use of such "humanized" fly models. 

Knowing the complete genomic sequence 
also allows new experimental approaches to 
long-standing problems. For example, it 
makes it possible to study networks of genes 
rather than individual genes or pathways. As- 
saying the level of transcription of every gene 
in the genome makes it at least theoretically 
possible to monitor the expression of an en- 
tire network of genes simultaneously. One 
problem that is approachable this way is the 
combinatorial control of gene transcription. 
The fly genome appears to encode only about 
700 transcription factors, and mutations in 
over 170 have already been isolated and char- 
acterized. The techniques are available to 
measure the changes in expression of every 
gene in individual cell types as a consequence 
of loss or overexpression of each transcrip- 
tion factor. We can look for common se- 
quence elements in the promoters of coregu- 
lated genes and perform chromatin immuno- 
precipitation to identify the in vivo binding 
sites of individual factors. For the first time, 
we can envision obtaining the data needed to 
understand the behavior of a complex regu- 
latory network. Of course, collecting these 
data is a massive task, and developing meth- 
ods to analyze the data is even more daunting. 
But it is no longer ludicrous to try. 

How big is the core proteome of humans? 
Vertebrates have many gene families with 
three or four members: the HOX clusters, 
calmodulins, Ezrins, Notch receptors, nitric 
oxide synthases, syndecans, and NF1 tran- 
scription factor genes are some examples 
(58). This is evidence for two genome dou- 
blings during mammalian evolution, super- 
imposed on which were the amplifications 
and contractions over evolutionary time that 
uniquely characterize each lineage (59). The 
human genome, with 80,000 or so genes, is 
likely to be an amplified version of a very much 
smaller genome, and its core proteome may not 
be much larger than that of the fly or worm; that 
is, the more complex attributes of a human 
being are achieved using largely the same 
molecular components. The evolution of ad- 
ditional complex attributes is essentially an 
organizational one; a matter of novel interac- 
tions that derive from the temporal and spa- 
tial segregation of fairly similar components. 

Finally, approximately 30% of the predicted 
proteins in every organism bear no similarity to 
proteins in its own proteome or in the pro- 
teomes of other organisms. In other words, 
sequence similarity comparisons consistently 
fail to give us information about nearly a third 
of the components that make every organism 
uniquely itself. What does this mean with re- 
spect to the evolution and function of these 
proteins? Does each genome contain a sub- 
population of very rapidly evolving genes? 
One-third of randomly chosen cDNA clones do 
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not cross-hybridize between D. melanogaster 
and Drosophila virilis (60). Even though these 
are distantly related species, they are develop- 
mentally and morphologically very similar. 
Crystal lographic data will be needed to deter- 
mine whether these proteins that have diverged 
in primary sequence have maintained their 
three-dimensional structures or have diverged 
so far that new folds and domains have formed. 

Our first look at the annotated fly genome 
provokes these and other questions. Access to 
the genomic sequence will help us design the 
experiments needed to answer them. The rel- 
ative simplicity and manipulability of the fly 
genome means that we can address some of 
these biological questions much more readily 
than in vertebrates. That is, after all, what 
model organisms are for. 
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A new 01 -containing integrin was isolated from rat 
liver by affinity chromatography on Sepharose conju- 
gated with the peptide GRGDSPC. The interaction was 
weakened but not abolished when the arginine and/or 
aspartic acid in the peptide were replaced with lysine 
and glutamic acid, respectively. In contrast, the cys- 
teine was necessary for binding of the integrin. The 01- 
associated protein, referred to as a9 9 had an N- terminal 
amino acid sequence related to but distinct from previ- 
ously described integrin a-subunits. In addition, an in- 
terna) peptide sequence was obtained which confirmed 
that the protein is a new member of the family of inte- 
grin a-subunits. An antiserum raised against a syn- 
thetic peptide corresponding to amino acids 1-16 of a9 
reacted specifically with this protein and was used to 
identify cr9 in several tissues. The integrin a901 was not 
retained on Sepharose conjugated with Englebreth— 
Holm-Swarm tumor (EHS)-laminin, collagen type I, or 
a 105-kDa cell-binding fragment of fibronectin. How- 
ever, it did bind specifically to EHS— laminin and colla- 
gen type I adsorbed to plastic microtiter wells. The sites 
of the interactions were localized to fragment E8 of 
EHS-laminin and to cyanogen bromide fragment 8 of 
collagen a 1(1) and were not inhibited by soluble RGD- 
containing peptides. The results indicate that o>901 is a 
widely distributed laminin/collagen receptor which 
may have additional, yet unidentified ligands. © 1994 

Academic Press, Inc. 


INTRODUCTION 

The integrins are a family of heterodimeric cell sur- 
face glycoproteins which mediate interactions between 
cells and the extracellular matrix (ECM) and in some 
cases take part in cell-cell adhesion [1-3]. These recep- 
tors are composed of noncovalently associated a- and 
0-subunits, which both span the plasma membrane. So 
far, 13 a- and 8 0- subunits have been described, which 
are capable of forming at least 19 different hetero- 
dimers. In addition, several variants generated by alter- 

1 To whom correspondence and reprint requests should be ad- 
dressed. Fax: (46) -18-650762. 


native splicing of the mRNAs are known. This multiplic- 
ity of ECM receptors allows cells to recognize and re- 
spond to alterations in the composition of ECMs. The 
cytoplasmic domains of the mtegrin subunits have been 
strongly conserved during evolution, indicating that 
they interact with other conserved structures inside the 
cells. Since all of the known integrin subunits have dis- 
tinct cytoplasmic domains, they may have different 
functions, e.g., be parts of separate signaling pathways, 
or be targets for regulation by different mechanisms. 

A number of integrins are known to interact with the 
tripeptide Arg-Gly-Asp (RGD). This motif was discov- 
ered as a cell-binding structure in fibronectin (FN) [4] 
and has later been found in many other proteins of the 
ECM. Substitution of single amino acids in the tripep- 
tide usually abolishes binding of receptors [5], although 
variants of the peptide have been shown to be functional 
in some cases [6, 7]. In addition, the amino acids 
surrounding the RGD sequence are of importance, in- 
fluencing the specificity and affinity for the receptors 
[8-12]. The detailed structural requirements for the in- 
teractions between integrins and RGD-containing pro- 
teins are not fully understood. One interesting suggest- 
ing is that the aspartic acid in the RGD motif may be 
involved in the coordination of divalent cations which 
are known to be necessary for binding of integrins to 
their ligands [13]. 

Several cell adhesion proteins of the extracellular 
matrix, e.g., FN, laminin (LN), and collagen, seem to 
possess RGD -dependent as well as RGD -in dependent 
binding sites for integrins [14-19]. Conversely, some in- 
tegrins recognize both RGD-containing determinants 
and protein structures lacking an RGD sequence [20, 
21]. In this study, a novel integrin was isolated which 
seems to have such a dual ligand specificity. 

MATERIALS AND METHODS 

Proteins and peptides, FN was purified from human plasma as 
previously described [22]. FN was digested with chymotrypsin and 
the cell-binding fragment (105 kDa) was isolated as described [23]. 
Vitronectin (VN> was a gift from Dr. Bjfirn Dahlback (Lund, Swe- 
den), fibrinogen was from KABI-Pharmacia (Stockholm), and colla- 
gen type I was from Vitrogen. The collagen aid) derived CNBr frag- 
ments CB3, CB7, and CBS were gifts from Dr. Kristofer Rubin (Upp- 
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sala, Sweden). LN, isolated from the Engelbreth- Holm- Swarm 
(EHS) tumor, and its fragments E8 and Pi f24] t were gifts from Dr. 
Mats Paulsson (Bern, Switzerland). The peptides GRGDSPC, 
GRGESPC, GKGDSPC, GKGESPC, GRGDSP, and YN1DAQRPV- 
RFQGPPGC (cr9,.i 6 ) were synthesized by Fmoc chemistry utilizing 
activation with HBTU 125 J. The peptides were deprotected by re- 
agent K 126), precipitated with diethylether, and dried under vacuum. 
The peptides were analyzed by plasma desorbtion mass spectrometry 
to verify the predicted masses and used without further purification. 
Peptide GRGDS was obtained from Calbiochem-Novabiochem. «9|_ ia 
corresponds to the N -terminal amino acids 1-16 of the integrum sub- 
unit «9 (see Fig. 3A) with an additional cysteine at position 17. 

Proteins were coupled to CNBr-activated Sepharose and peptides 
were coupled to activated CH-Sepharose in 0.1 M NaHC0 3 , pH 8.0, 
0.15 M NaCJ according to the recommendations of the manufacturer 
(Ptaarmacia-LKB Technology). During the coupling procedure of cys- 
teine-containing peptides to CH-Sepharose, samples were continu- 
ously taken from parallel incubations of peptides in 0.1 M NH«CO a , 
pH 8.0. and analyzed by mass spectrometry to monitor dimerization 
of peptides due to oxidation of the thiol group. No dimerization of the 
peptides in these experiments was detected even after prolonged in- 
cubations (90 min) at room temperature (not shown). Further, a sam- 
ple of the GRGDSPC-Sepharose was eluted with reducing agent (1 
mg/ml DTT). No peptides were detected in the DTT eluate by mass 
spectrometry. 

Antibodies. The antisera against the 01- and a5-subunits have 
been described previously [27, 281. The rabbit antisera against the 
integrin subunits 03 and av were kind gifts from Dr. Ake Oldberg 
(Lund, Sweden) and Dr. James Gailit (La Jolla, CA), respectively. An 
antiserum against oc9 was produced by immunizing a rabbit with the 
a9,_ 16 peptide. Each injection, given intramuscularly at 2-week inter- 
vals, contained Freund's adjuvant mixed with 150 pg of free a9,_ l6 and 
350 fig of ovalbumin conjugated with the peptide by maleiraidoben- 
zoyl-N-hydrosuccinimide as described (29). 

Electrophoresis and immunoblotting. Polyacrylaraide gel electro- 
phoresis in sodium dodecyl sulfate (SDS-PAGE) was performed in 
7% gels (30]. Lmmunoblot analysis of proteins electrophoretically 
transferred to nitrocellulose after SDS-PAGE was performed as de- 
scribed [31 J. The nitrocellulose filters were incubated with antiserum 
at a 1:50 dilution and the specifically bound antibodies were allowed 
to react with l2fi I-labeled protein A. In some experiments, the en- 
hanced chemiluminescence method was used instead. In these cases, 
the protein A step was replaced by incubation with peroxidase-conju- 
gated anti-rabbit IgG at a 1:5000 dilution under conditions recom- 
mended by the manufacturer (Amersham). For detection of recog- 
nized antigens by either method, the nitrocellulose filters were ex- 
posed to X-ray films (Fuji). The molecular weights of the blotted 
proteins were estimated by relating their positions to those of marker 
proteins transferred from the polyacrytamide gels. 

Purification of integrins. Seven to ten rat livers were homogenized 
on ice in 200 m! of a buffer containing 10 mAf Tris/HCl, pH 7.4, 10 
r&M EDTA, 2% Triton X-100, 2 mAf PMSF, and 1 mg/ml pepstatin 
A. The homogenate was centrifuged at 25,00% for 45 min and the 
supernatant was applied at a flow rate of 50 ml/h to a wheat germ 
agglutinin ( WGA)-Sepharose (50 ml bed volume, 260 mg WGA). The 
column was washed with a buffer containing 10 mM Tris/HCl, pH 
7.4, 50 mM NaCl, 0.2% Triton X-100, 0.2 mM PMSF, 1 mg/ml pep- 
statin A, and 2 mM MnCI 2 (column buffer) and eluted in the same 
buffer containing 0.3 M N-acetylglucosamine. The eluted proteins 
were applied at 5 ml/h to columns of Sepharose conjugated with the 
105- kDa FN fragment (10 mg coupled to 1.5 ml of Sepharose), colla- 
gen I (20 mg coupled to 3 ml of Sepharose), EHS-LN (10 mg coupled 
to 1.5 ml of Sepharose), or different RGD-containing peptides (4 mg 
of peptides coupled to 0.6 ml of Sepharose). After washing, integrins 
were eluted off the columns with 10 mM EDTA in 10 mM Tris/HCl, 
pH 7.4, 150 mM NaCl, 0.2% Triton X-100, 0.2 mM PMSF, and I 
mg/ml pepstatin A, All steps after the tissue homogenization were 


performed at 4°C. The eluted integrins were analyzed by SDS- 
PAGE. For lte I-labeling of the integrins, the chloramine-T method 
using Iodobeads (Pierce Chemical Co.) was applied. 

Amino acid sequence determination. Amino -terminal sequencing 
of isolated integrins was performed after separation of the a- and 
0-subunits in 5% SDS-PAGE and transfer of the proteins to PVDF 
membranes (MUHpore) [321. T° prevent blockage of the amino termi- 
nal during electrophoresis, thioglycolate was added to the gel solution 
(0.1 mM), to the upper electrophoresis buffer (0.05 mM), and to the 
sample buffer (4 mAf). After staining of the PVDF membrane with 
Coomassie BB, the bands were cut out and subjected to amino acid 
sequencing in a gas-phase sequenator (Applied Biosystems Model 
476). For internal amino acid sequencing, the integrin subunit was 
incubated with a lysine-specific enzyme from Achromobacter Lyticus 
followed by separation of obtained fragments by reverse-phase chro- 
matography on a C-4 column (2.1 X 30 mm, Aquapore, Brownlee). 
The repetitive yield was between 92 and 95% for all reactions of the 
amino acid sequences presented. 

Solid-phase receptor assay. Wells of plastic ELISA microti ter 
plates (Dynatech Laboratories, Inc.) were coated with the indicated 
proteins in 150 *<1 coating buffer (10 mMTris, pH 7.4 , 150 mM NaCl, 1 
mM CaCl 2 , 1 mAf MgCl 2 ) overnight at 4 6 C. Subsequently, the wells 
were incubated with the same buffer containing 1.5% bovine serum 
albumin (BSA) for 2 h at 22°C and washed twice with binding buffer 
(10 mM Tris, pH 7.4, 50 mM NaCl, 1 mM CaCl z , 1 mM MgCl tt 0.1% 
Triton X-100). The I25 l-labeled integrin was incubated in the wells in 
binding buffer for 90 min at room temperature. After washing with 
binding buffer, hound receptors were released by the addition of 1 ml 
1% SDS and quantified in a gamma counter. 

Ceil attachment assay. Hepatocytes were isolated by collage nase 
perfusion of the liver and used for cell attachment studies as de- 
scribed [17]. 


RESULTS 

Purification and Identification of Integrin Dimers 

By use of affinity chromatography, one major FN re- 
ceptor, integrin a5fil t and one collagen/LN receptor, in- 
tegrin al£l, have previously been identified on rat he- 
patocytes [14, 17, 19]. While the integrin a5/?l recog- 
nizes the RGD sequence in FN (5, 14], the integrin a\01 
binds to collagen type I and EHS-LN independently of 
RGD sites. By cell attachment experiments there were 
indications for additional LN receptors on hepatocytes, 
some of which appeared to recognize RGD (Forsberg 
and Johansson, unpublished observations). In attempts 
to isolate RGD-binding proteins from rat liver, the solu- 
bilized tissue was applied to Sepharose conjugated with 
different RGD-containing peptides. When a synthetic 
peptide consisting of the FN sequence GRGDSP with 
an additional cystein at the C-terminus was used as af- 
finity matrix, four major proteins were recovered in the 
EDTA eluate. The apparent M r of these proteins in 
SDS-PAGE were 145/130/115/90 kDa under nonre- 
ducing conditions and 160/130/110/100 kDa after re- 
duction (Fig. 1A). To identify the bands, the 
GRGDSPC-binding proteins were analyzed by immuno- 
blotting. As shown in Fig. IB, the bands with a M r of 130 
and 90 kDa (nonreduced) reacted with an antiserum 
against ov/33. The 90-kDa band was also recognized by 
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an antiserum specific for the 03-subunit. The 115-kDa 
component was recognized by an antiserum against the 
01-subunit. Based on the known apparent M r in SDS- 
PAGE of 01, ov, and 03, they would correspond to the 
reduced bands of 130, 110, and 100 kDa, respectively. 
However, none of these or other tested antisera reacted 
with the 145-kDa band. Thus, the GRGDSPC-binding 
fraction contained the integrin avf?3 and the integrin 
01 -subunit associated with an unidentified a-subunit 
(145 kDa nonreduced, 160 kDa reduced). 

The two heterodimers were found to be separated 
when applied to a GRGDS-Sepharose. As shown in Fig. 
2, the integrin ovj83 was retained on the GRGDS-Sepha- 
rose (lanes 3, 4), while all of the unidentified 01-integrin 
passed through the column and could subsequently be 
recovered essentially free from ov03 by use of the 
GRGDSPC-Sepharose (lanes 5, 6). For comparison, 
abpl eluted from Sepharose conjugated with a 105-kDa 
FN fragment (lanes 1, 2) and al01 isolated on collagen- 
Sepharose (lanes 7, 8) are shown. These two integrins 
did not bind to any of the peptide columns, as expected, 
since a501 requires larger FN fragments for stable bind- 
ing, and al01 recognizes yet unidentified structures 
which differ from RGD. Conversely, the peptide-bind- 
ing integrins did not bind to Sepharose conjugated with 
collagen or to the 105-kDa FN fragment, which contains 
a GRGDSPA sequence. A 70/75-kDa protein (nonre- 
duced /reduced) was obtained in variable amounts from 
the GRGDS column (visible in lane 4). The protein was 
not obtained from purified hepatocytes, but was recov- 
ered when blood was subjected to the purification pro- 
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FIG. 1. Affinity chromatography of liver extract on GRGDSPC- 
Sepharose. (A) WGA-binding proteins isolated from rat liver as de- 
scribed under Materials and Methods were applied to a column of 
Sepharose conjugated with GRGDSPC peptide. After washing of the 
column, proteins were eluted with buffer containing EDTA, subjected 
to SDS-PAGE in unreduced (lane 1) or reduced form (lane 2), and 
stained with silver. The migration of size marker proteins and their 
apparent M T (kDa) are indicated. The four major protein bands of the 
samples are marked with dots. (B) The proteins eluted from 
GRGDSPC-Sepharose (shown in lane 1) were analyzed in reduced 
form by immunoblotting as described under Materials and Methods. 
The antisera used were anti-«v£3 (lane 1), anti-03 (lane 2), anti-01 
(lane 3), and preimroune serum (lane 4). The migration of the size 
marker proteins is indicated. 
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FIG. 2. SDS-PAGE of integrins isolated from rat liver. WGA- 
binding proteins from rat liver were applied sequentially to four dif- 
ferent columns. EDTA-eJuted proteins from the Sepharose columns 
conjugated with a 105-kDa FN fragment (lanes 1 and 2), GRGDS 
peptide (lanes 3 and 4), GRGDSPC peptide (lanes 5 and 6), and colla- 
gen type I (lanes 7 and 8) were analyzed by SDS-PAGE in unreduced 
(lanes 1, 3, 5, 7) and reduced (lanes 2 f 4, 6, 8) forms. The migration of 
size marker proteins and their apparent M t (kDa) are indicated. 


cedure (not shown). This component was not further 
studied. 

In order to identify the a-subunit, it was isolated from 
50 livers by the procedure shown in Fig, 2 followed by 
preparative SDS-PAGE to remove the 01 -subunit. 
After transfer to PVDP membrane the N-terminal 
amino acid sequence of the protein was determined. A 
search in the Swiss Protein Data base revealed that the 
20-amino-acid-long sequence obtained was related to 
the N-terminal parts of integrin a-subunits (Fig. 3A). 
Among the subunits known to associate with 01, the 
number of identical amino acids were the highest for a5 
(60%) and the lowest foral and a2 (35%). The relatively 
low degree of homology makes it unlikely that a9 is the 
rat counterpart to any of the known human integrin 
subunits. Furthermore, a polyclonal antiserum raised 
against rat a5 did not recognize the 145/160-kDa sub- 
unit (Fig. 5). An internal amino acid sequence of the 
protein (Fig. 3B) also showed some homology to human 
integrin a-subunits, but was clearly distinct. Since the 
protein has unique amino acid sequences at two differ- 
ent positions, it is unlikely to be a splice variant of a 
previously described integrin a-subunit. According to 
the present nomenclature it is referred to as ct9 in the 
following. 

Amino acid sequencing of the integrin subunit recog- 
nized by antibodies against 03, and of an internal pep- 
tide generated from the integrin subunit reacting with 
antibodies against 01, confirmed that they are the rat 
homologues to these human proteins (Fig. 4). 

Immunization with a synthetic peptide (a9j_ 16 ) made 
from the N-terminal sequence of a9 (amino acids 1-16) 
resulted in an antiserum which reacted with a9 in im- 
mun ©precipitation (data not shown) and in immuno- 
blotting as shown in Fig. 5. It recognized both the unre- 
duced and the reduced forms of a9. After long exposure, 
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FIG. 3. Comparison of amino acid sequences of 01 -associated integrin or-subunite. The N-terminal amino acid sequence (A) and an 
internal amino acid sequence (B) of the integrin a9-subunit from rat is shown at the top. Below are the rat subunit al and the human subunits 
a2-or8 and av with amino acids identical to ot9 marked with an asterisk. Gaps (-) are inserted to optimize homologies; (h) and (r) denote human 
and rat sequences, respectively. Information on the human amino-terminal sequences was from Kramer etai 1991 [37] for a2-al and av and 
from Bossy et at. 1991 [51] for «8. The other human sequences were from Takada et al. 1991 [52] for a2-«6, a$, and av and from Song et al. 1992 
[53] for cfl. The rat al sequence was from Ignatius et al. 1990 [34], The location of the av peptide in the protein is indicated by the number of 
the C-terminal serine. 


a faint reactivity with a5 could also be detected, possibly 
due to a stretch of four identical amino acids in the a5 
and the a9i_ 19 peptides (amino acids 13-16, see Fig. 3A). 
No cross- reactivity with al, av, £1, or 03 was observed 
(not shown). 

Purification of Integrins from Different Organs 

To determine if the integrin a901 was expressed in 
other tissues than liver, the same fractionation proce- 
dure was also applied to spleen, kidney, heart muscle, 
and skeletal muscle. From all organs tested, an integrin 
was eluted from the GRGDSPC column which migrated 
identically to a9 in SDS-PAGE and which reacted with 
the anti-a9 antiserum in immunoblotting (not shown). 
Estimation of the yields of integrins a50l and a9ffl (Ta- 
ble 1) shows that approximately equal amounts of a9pl 
were isolated from all organs except liver, from which 
significantly more a901 was obtained. The integrin 
a5/?l was found to be even more abundant in the liver 

499 519 
Bl (h) HC ECSTDEVNSEDMDAVCRK 

0Hr) HCECSTDEVNSEDMDAYCRK 

1 14 
p3(h> GPNICTTRGVSSCQ 

p3(r) ESNICTTRGVKSCQ 

FIG. 4. Comparison of the rat 01 and £3 amino acid sequences 
with their human counterparts. The proteins from rat liver which 
copurified with a9 and orv, respectively, were sequenced as described 
under Materials and Methods and compared to their human counter- 
parts; (h) and (r) denote the human sequence and rat sequence, re- 
spectively. Information on the human sequences was from Moyle et 
ai 1991 (54). 


relative to the other organs. We also isolated al/31 and 
avjS3 from these organs; al&l was obtained in the larg- 
est amount from skeletal muscle, while avf$3 was ob- 
tained in small but detectable amounts from all organs 
(data not shown). Although this method of quantifica- 
tion may not accurately reflect the actual amount of the 
integrins in the different organs, due to the semiquanti- 
tative nature of the technique and different recoveries 
of membrane proteins from different organs, it serves to 
illustrate that a9fil has a widespread distribution. 

Peptide Specificity 

Different variants of the GRGDSPC peptide were syn- 
thesized, coupled to Sepharose, and used in affinity 

anti-ao I anti-OCs 
12 3 4 | 5 6 7 8 



DTT- + - + - + - + 

FIG. 5. Immunoblotting with anti-a9 antibodies. Integrins a501 
(lanes 3, 4, 5, 6) and a9j9l (lanes 1, 2, 7, 8) were run on SDS-PAGE in 
unreduced form (lanea 1, 3 ? 5, 7) and in reduced form (lanes 2, 4, 6, 8) 
and transferred to nitrocellulose filter. The filter was cut into halves 
which were incubated with antisera against the <*9-subunit (lanes 
1-4) and the a5-suhunit (lanes 5-8), respectively. The immunoreac- 
tive proteins were visualized by ECL. The migration of size marker 
proteins and their apparent M r <kDa) are indicated. 
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chromatography of liver extracts. As shown in Fig. 6, a 
GRGESPC peptide was almost as efficient as the 
GRGDSPC peptide in the binding of «901. The 
GKGDSPC and GKGESPC peptides, where the argi- 
nine residue of the two peptides above was substituted 
with lysine, had lower but still detectable binding activ- 
ity. The cysteine, however, was critical; a GRGDSP 
peptide did not bind adfil at all. The identity of the 
material eluted from the peptide-Sepharoses as a9j91 
was confirmed by immunoblotting with the anti-c*9 an- 
tiserum (not shown). 

No further proteins were released from the 
GRGDSPC-Sepharose by reducing agent (1 mg/ml 
DTT) after the EDTA elution, as monitored by silver 
staining of SDS-polyacrylamide gels (not shown). This 
result indicates that the binding of o901 is not mediated 
by another protein bound to the thiol group of the pep- 
tide. 

Ligand Specificity 

By the use of a solid-phase binding assay, the ability 
of different ECM proteins to function as ligands for the 
integrin a901 was tested. EHS-LN and collagen type I 
adsorbed to plastic microtiter wells were found to bind 
125 I-labeled «9jSl in a concentration-dependent manner, 
whereas the level of binding of a9&l to FN and fibrino- 
gen was close to the background binding to BSA. VN 
bound some a9/?l when coated at low concentrations, 
but at increasing ligand density less binding of receptor 
was obtained (Fig. 7 A). A 50- fold excess of unlabeled 
a901 reduced the binding of 125 I-labeled a901 to LN and 
collagen by more than 60%, while the GRGDSPC pep- 
tide at concentrations up to 1 mg/ral did not affect the 
binding significantly (data not shown). To identify re- 
gions in EHS-LN and collagen I that interact with 
a901, different fragments from the proteins were used 
in the binding assay (Fig. 7B). The integrin was found to 
bind to a fragment corresponding to the long arm of LN 


TABLE 1 


Purification of Integrins from Different Organs 



Kidney 

Heart 

Liver 

Spleen 

Muscle 


+ 

+ 


+ 

+ 

a501 

+ 

+++ 

++++ 

+ 

+ 


Note, Equal amounts of the indicated organs (12 g) were processed 
for isolation of integrins as described under Materials and Methods. 
The yield of purified proteins was measured by densitometric scan- 
ning of the £1 band after SDS-PAGE and silver staining. For this 
purpose, an Ultroscan XL enhancer laser densitometer with the 2400 
Gelscan software version 2.0 {Pharmacia) was used. The amount of 
a501 obtained from liver was set to 100% (++++), and the symbols 
+++, ++, and + denote ~40, —10, and 2-6% of this amount, respec- 
tively. 
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FIG. 6. Specificity of peptide binding of a901. Variants of the 
GRGDSPC peptide were synthesized, coupled to Sepharose, and used 
in affinity chromatography of liver extracts. Equal volumes of the 
eluate from GRGDSPC-, GKGDSPC-, GKGESPC-, and GRGESPC- 
Sepharose were applied (in reduced form) to lanes 1, 2, 3, and 4, re- 
spectively, of an SDS-polyacrylamide gel. After electrophoresis, the 
proteins were stained with silver. The migration of size marker pro- 
teins and their apparent M r (kDa) are indicated. 


(E8), but not to the central cross domain of LN (PI), Of 
the three tested CNBr fragments of collagen al(I), the 
CB8 fragment specifically bound the integrin, while the 
CB3 and the CB7 fragments were inactive. A triple-heli- 
cal structure of CB8 and collagen was apparently re- 
quired for the interaction with ot901, since termal dena- 
turation at 50° C for 10 min prior to coating to the plas- 
tic wells at 37°C reduced the binding by >80 and 70%, 
respectively (not shown). These LN and collagen frag- 
ments were all active in promoting attachment of hepa- 
tocytes (Fig. 8), demonstrating that receptor-binding 
structures were also available in the plastic adsorbed 
fragments which did not bind a9jSl. 

DISCUSSION 

In this paper we describe the purification and initial 
characterization of an integrin from rat liver containing 
the 01-subunit in association with a previously unidenti- 
fied a-subunit. 

The novel integrin subunit, which migrates similarly 
to a2 in SDS-PAGE, was observed in the EDTA eluate 
from GRGDSPC-Sepharose during affinity chromatog- 
raphy of solubilized rat liver. Since no integrin of the 
jfll-group except ocvffl [33] has been shown to bind short 
RGD-containing peptides in affinity chromatography, 
the a- and jS-subunits were subjected to amino acid se- 
quencing. A peptide sequence obtained for the ^-sub- 
unit was identical to the human jSl, and the two 20- 
amino-acid-long sequences obtained from the a-subunit 
showed similarity with other integrin a-subunits, but 
were clearly distinct from previously known proteins. A 
comparison of the peptide sequences with the known 
01 -associated a-subunits showed that the number of 
identical amino acids was in the same range as that ob- 
tained when the corresponding regions of the different 
human a-subunits are compared (35-60% for the N-ter- 
minus). The new subunit, called a9, is not likely a rat 
counterpart to any of the human a-subunits since the 
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FIG. 7. Solid-phase receptor assay. l26 I-Labeled «9#1 (10,000 cpm/ng) was incubated in microliter wells (100,000 cpm/well) as described 
under Materials and Methods. The wells were previously coated with (A) EHS-LN, collagen type I, plasma FN, fibrinogen, VN or (B) 
fragments of LN (E8 and Pi) and collagen I (CB3, CB7, and CB8), respectively, at the indicated concentrations. After washing, bound 
radioactivity was quantified in a gamma counter. Background binding to BSA -blocked wells was 250-300 cpm. The lowest coating concentra- 
tion tested was 0.1 jig/ml. 


degree of sequence similarity between homologous inte- 
grin subunits from different species is higher. For exam- 
ple, the identity between the rat and human a 1 -se- 
quences is 80% [34], between the human and the 
chicken av-subunits 80% [35, 36], and between the hu- 
man and the mouse art it is 92% [37]. Furthermore, the 
highest similarity of a9 (N-terminus) was to a5, which 
can be distinguished from a9 based on migration in 


100 



Coated ngand (ug/ml) 

FIG. 8. Attachment of hepatocytes to LN and collagen frag- 
ments. Hepatocytes (4 X 10 5 cells) were seeded in wells (10 mm in 
diameter) coated with the indicated concentrations of the fragments 
PI, E8 (from EHS-LN), CB3, CB7, and CB8 (from collagen al(l)), 
respectively. The lowest coating concentration tested was 0.1 ng/m\. 
After incubation at 37°C for 60 min, the dishes were washed [17) and 
the number of attached cells was determined by the hexosaminidase 
assay (551. The number of cells attached to dishes coated with 10 
Mg/ml of intact collagen I was set to 100%. 


SDS-PAGE (Fig. 2) and antibody reactivity (Fig. 5). 
The c*9-subunit belongs to the group of integrin sub- 
units which do not have a light chain, as reduction re- 
sulted in slower migration in SDS-PAGE and no release 
of a low-molecular-weight fragment. 

The binding specificity was analyzed with different 
peptides similar to the GRGDSPC peptide, coupled to 
Sepharose. The RGD sequence was not absolutely re- 
quired for binding of a9@l\ peptides in which the aspar- 
tic acid was changed to a glutamic acid bound a9ffl al- 
most as well as the GRGDSPC peptide. Substitution of 
arginine with a lysine resulted in reduced binding activ- 
ity of the peptides (GKGDSPC and GKGESPC), but 
«901 in low amounts could still be recovered also from 
these affinity matrices. However, the cysteine was criti- 
cal for the interaction since a GRGDSP peptide lacked 
binding activity. The cysteine-containing peptides cou- 
pled to the columns existed as monomers, as shown by 
the mass spectrometry analysis. Recently the integrin 
oc20l was reported to interact with cyclic, but not with 
linear, RGD-containing peptides [8]. The authors sug- 
gested that a rigid conformation of the peptide was 
needed in order to bind a2ffl. Even if a very small 
amount of the peptides on the column we used occurred 
as dimers formed by a disulfide bridge between two pep- 
tides coupled to the column, the interaction between 
a9pl and the peptides is not dependent on a rigid con- 
formation since soluble peptides could elute the integrin 
from the column (data not shown). 

The specificity of the integrin a901 for potential phys- 
iological ligands was investigated using a solid-phase 
receptor assay. As the integrin is widespread and could 
be purified from all organs analyzed, we tested extracel- 
lular proteins with a widespread distribution. FN, VN, 
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and fibrinogen, proteins known to bind integrins by the 
use of RGD sequences, were found to bind «9j81 poorly. 
Surprisingly, 12B I -labeled a9£l bound specifically to 
EHS-LN and collagen I in this assay. The regions of 
EHS-LN and collagen al(l) that interacted with the 
integrin were located on the E8 fragment of LN and on 
the CB8 fragment of collagen al(I). In contrast, a9fil 
was not retained during affinity chromatography on 
Sepharose conjugated with EHS-LN or collagen I, illus- 
trating that results obtained by these techniques have to 
be interpreted with caution. 

Two of the fragments which did not bind a9j8l, CB7 
and Pi, contain an RGD sequence, while both of the 
a901-binding fragments lack this motif. Further, the 
binding of l25 I-labeled a9j81 to LN- or collagen-coated 
plastic could not be inhibited by soluble RGD-contain- 
ing peptides, indicating that other or additional mecha- 
nisms are involved in the interactions. Collagen- and 
LN -binding integrins have in most cases been found not 
to recognize RGD sequences. However, RGD-depen- 
dent binding of integrins to native collagen has been 
described [8, 18], and avj83 and «5£1 are probably able 
to interact with RGD sequences in collagen if the colla- 
gen molecule is denatured [19, 38], Similarly, RGD-de- 
pendent binding of cells to LN has been shown [16, 39, 
40], although there are indications that the RGD site in 
LN is cryptic in the native molecule [41]. av03 t which 
usually hinds its ligands via RGD sequences, has been 
implicated as a LN receptor [16], but in one study the 
interaction of this integrin with LN was concluded to be 
RGD-independent [21]. 

The ligand specificity of a9jSl was also surprising in 
view of the ability of antibodies against the integrin 
al-subunit to completely block hepatocyte attachment 
to collagen type I |19J. These results indicate that alpl 
is a dominant receptor on hepatocytes during formation 
of initial contacts with collagen type I. The same al 
antibody only marginally inhibits the binding of hepa- 
tocytes to LN in the same type of experiment (Forsberg 
and Johansson, unpublished observations), suggesting 
that LN receptors other than al/31 are active on these 
cells. The integrin «901 is a possible candidate for such 
a function. To investigate if a9fil was present on hepa- 
tocytes, rat livers were perfused and the different cell 
types were separated as previously described [17]. a901 
could be isolated both from hepatocytes and from the 
nonparenchymal fraction-containing endothelial cells, 
Ito cells, and Kupffer cells (data not shown). 

To search for natural ligands to this integrin we also 
looked for the presence of sequences similar to 
GRGDSPC in extracellular matrix proteins. Human 
collagen type I contains two RGD sequences in the or 1(1) 
chain and three RGD sequences in the a2(I) chain [42, 
43], At least two of these are not conserved between 
species and none of them has a cysteine close to the 
RGD sequence. Alternatively, the binding site could be 


built up from parts of more than one collagen chain in 
the triple-helical molecule. The only RGD sequence in 
mouse EHS-LN is located in the Pi region, and it is not 
followed by a cysteine. Somewhat related sequences are 
present in S-LN (GDAPC) [44] and in the B2 chain of 
human LN (RGCTPC) [45]. Examples of other proteins 
with similar sequences are thrombin (RGDAC) [46], fi- 
bulin (RGGGPC) [47], thrombospondin (RGDAC) [48], 
and nidogen (RGDGQTC) [49]. The two latter proteins 
both have a number of repeats containing RGD -like se- 
quences followed by cysteines [49, 50), 

In summary, we have purified a new member of the 
integrin family, o£9j8l, which is present in several organs. 
a9j8l is efficiently purified on Sepharose conjugated 
with a cysteine-containing RGD peptide. EHS-LN and 
collagen type I are two candidates as physiological li- 
gands for this receptor based on solid-phase binding 
data. Since neither of these proteins bind a9f$l in an 
RGD-dependent manner additional ligands are likely to 
exist. 2 
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Summary 

Transforming growth factor p (TGFp) family members 
are secreted in inactive complexes with a latency- 
associated peptide (LAP), a protein derived from the 
N-terminal region of the TGFp gene product. Extracel- 
lular activation of these complexes is a critical but 
incompletely understood step in regulation of TGFp 
function in vivo. We show that TGFpi LAP is a ligand 
for the integrin avp6 and that avp6-expressing cells 
induce spatially restricted activation of TGFpi. This 
finding explains why mice lacking this integrin develop 
exaggerated inflammation and, as we show, are pro- 
tected from pulmonary fibrosis. These data identify a 
novel mechanism for locally regulating TGFpi function 
in vivo by regulating expression of the avp6 integrin. 

Introduction 

The transforming growth factor p (TGFp) family consists 
of three closely related isoforms (TGFpi, -2, and -3) 
that are prototypes of the larger TGFp superfamily. In 
vitro, TGFps exert nearly identical effects that can be 
grouped into three broad areas: modulation of inflam- 
matory cell function, growth inhibition and differen- 
tiation, and control of extracellular matrix production. 
Studies of animal models as well as human clinical spec- 
imens strongly suggest that TGFps are important in the 
pathogenesis of several diseases, including fibrotic con- 
ditions (Broekelmann et al., 1991; Border et al. f 1992; 
Sime et al., 1 997). TGFpi knockout mice develop diffuse 
mononuclear cell infiltrates that prove lethal within a few 
weeks from birth (Shull et al., 1992; Kulkarni et al., 1993). 
In contrast, TGFp2 and TGFp3 knockout mice display 
only developmental defects (Kaartinen et al. ( 1995; San- 
ford et al., 1 997). Major differences among TGFp isoform 

7 To whom correspondence should be addressed (e-mail: deans® 
itsa.ucsf.edu). 
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functions in vivo are due at least in part to differences 
in the promoter regions of the various isoform genes 
(Taipale et al., 1998). It is also possible, but not proven, 
that there are TGFp isoform-specific mechanisms for 
converting latent TGFps to the active forms. 

The TGFps are secreted as complexes composed of 
three proteins derived from two genes. Each TGFp gene 
encodes a procytokine consisting of a C-terminal TGFp 
sequence and a larger IM-terminal region that, after pro- 
cessing, forms a protein called latency-associated pep- 
tide (LAP). LAP and TGFp remain noncovalently associ- 
ated, and in this configuration TGFp is unable to bind 
to its receptors; that is, TGFp is latent. In most cases, 
the complex of LAP and TGFp (the small latent complex 
SLC) is joined by latent TGFp binding protein 1 (LTBP1), 
a matrix protein with sequence similarity to the fibrillins, 
and the complex of all three proteins is called the large 
latent complex (LLC). Latent TGFp can be linked by 
LTBP to binding sites in the extracellular matrix (Taipale 
et al., 1996). 

The mechanisms involved in activating latent TGFp 
are not fully understood, but recently there has been 
important progress in this area. Plasmin can activate 
latent TGFp in cell-free systems (Lyons et al., 1990) and 
in cell culture (Sato et al„ 1990). However, plasminogen 
knockout mice display none of the pathologic features 
of TGFp knockout mice, suggesting that plasmin is un- 
likely to be the only molecule activating TGFp. Reactive 
oxygen species can activate TGFp in vitro (Barcellos- 
Hoff and Dix, 1996), and radiation treatment appears 
able to activate TGFp in vivo via this mechanism (Bar- 
cellos-Hoff et al., 1994). Thrombospondin (TSP) 1 can 
activate TGFp by binding to a defined site on LAP and 
inducing a conformational change in the latent complex; 
TGFp is then bound to TSP1 in an active state (Schultz- 
Cherry and Murphy-Ullrich, 1993; Schultz-Cherry et al., 
1995). A recent study of the similar patterns of inflam- 
mation exhibited by TGFpi and TSP1 knockout mice 
suggests that TSP1 is a major activator of TGFpi in 
vivo (Crawford et al., 1998). However, the inflammatory 
changes in the TSP1 knockout mice are not nearly as 
severe as those in TGFpi knockout mice, suggesting 
overlapping mechanisms of TGFp activation. 

LAP-pi and LAP-p3 contain arginine-glycine-aspartic 
acid (RGD) sequences, which are also binding site motifs 
in ligands for a subset of integrins. LAP-pi can bind 
effectively to one such integrin, avpi, but the functional 
role of LAP-integrin interactions is not known (Munger 
et al., 1998). Integrins were first identified based on their 
roles in mediating cell attachment and migration but 
have recently been recognized to participate in more 
complex cellular events, including survival (Meredith et 
al., 1993), proliferation, and regulation of gene expres- 
sion (Werb et al., 1989). The fact that avpi can bind 
latent TGFp suggests that this or other RGD-binding 
integrins might regulate TGFp bioactivity. 

The integrin avp6 is expressed principally on epithelial 
cells, where it has been shown to be a receptor for RGD 
sites in fibronectin (Weinacker et al., 1994), tenascin 
(Prieto et al., 1993), and vitronectin (Huang et al., 1998a). 
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avp6 is expressed at low levels in healthy adult lung 
tissues but is rapidly upregutated by injury and inflam- 
mation (Breuss et al. f 1995). Inactivation of the p6 sub- 
unit gene in mice revealed an unexpected role for avp6 
in downregulating inflammatory responses to minor en- 
vironmental insults in the lungs and skin (Huang et al., 
1 996). Somewhat surprisingly, despite exaggerated skin 
and lung inflammation, p6~'~ mice do not develop fibro- 
sis at either site. The combination of enhanced inflam- 
mation and protection from fibrosis suggested a local- 
ized deficiency of active TGFpl as a cause of the p6~'~ 
phenotype. We therefore sought to determine whether 
TGF01 LAP is a ligand for c*vp6 and whether interaction 
of avp6 with LAP-containing complexes can lead to 
latent TGFpl activation. To determine whether such an 
effect might have relevance to disease, we also utilized 
$6~'~ mice in a well-characterized model of pulmonary 
fibrosis induced by bleomycin, a model that has pre- 
viously been shown to be critically dependent on TGFp 
(Giri et al. f 1993). 

Results 

TGFpl LAP Is a Ligand for the Integrin avp6 
To determine whether LAP-p1 could bind avp6, we per- 
formed affinity chromatography by passing labeled se- 
creted avp6 over Sepharose cross-linked to recombi- 
nant LAP, the known avp6 ligand fibronectin, or bovine 
serum albumin (BSA; to detect nonspecific binding). 
Bound proteins were eluted by EDTA, since interactions 
of integrins with ligands require the presence of divalent 
cations. Bands corresponding to truncated av (1 30 kDa) 
and p6 (85 kDa) were eluted by EDTA from LAP- or 
fibronectin-Sepharose, but not from BSA-Sepharose 
(Figure 1A). The identity of the 85 kDa band as p6 was 
confirmed by Western blotting and by immunoprecipita- 
tion (Figures 1 B and 1 C). To demonstrate that full-length 
avp6 also binds to LAP, we repeated affinity chromatog- 
raphy with unlabeled octylglucoside lysates of p6-trans- 
fected SW480 cells (Figure 1 D). A 95 kDa protein corre- 
sponding to full-length p6 was detected by Western 
blotting in eluted fractions from LAP- Sepharose but not 
from BSA-Sepharose. 

To determine the effects of avp6/LAP interactions on 
cells, we performed cell adhesion assays with p6-trans- 
fected SW480 cells. LAP-coated wells supported avp6- 
dependent adhesion of p6-transfected cells, but mock- 
transfected cells did not adhere to any concentration 
of LAP (Figure 2B). Essentially identical results were 
obtained with mock- and p6-transfected 293 cells, Chi- 
nese hamster ovary (CHO) cells, and NIH 3T3 cells (not 
shown). p6-transfected SW480 cells, but not mock 
transfectants, also adhered to dishes coated with large 
latent TGFpl complexes (LLC; Figure 2C). Adhesion to 
LAP and LLC was abolished by anti-avp6 antibody 10D5 
and was unaffected by antibodies against pi (P5D2) or 
avp5 (P1F6; not shown). To determine whether avp6 
mediated adhesion to LAP through an interaction with 
the RGD sequence, we performed cell adhesion assays 
with mutant LAP containing a D-to-E substitution muta- 
tion within the RGD site (Figure 2D). p6-transfected 
SW480 cells did not attach to any concentration of mu- 
tant LAP. Furthermore, adhesion of p6-transfected, but 
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Figure 1. Affinity Chromatography 

(A) 3S S-labeled secreted av(36 was incubated with either BSA-, fi- 
bronectin- or LAP-Sepharose. Bound proteins were eluted with 
EDTA and analyzed by SDS-PAGE under nonreducing conditions. 
Lane 1 was the final fraction eluted with column buffer, lanes 2-6 
were eluted with EDTA, and lane 7 was eluted with 8 M urea. 

(B) Western blot of eluted fractions with anti-06 antibody 4B5. Lane 
1 was the final fraction eluted with column buffer, lanes 2-4 were 
eluted with EDTA, and lane 5 was eluted with 8 M urea. 

(C) Immunoprecipitation of CHO supernatant (lane 1) and CHO su- 
pernatant proteins eluted by EDTA from BSA (lane 2) and LAP (lane 
3) columns with anti-avp6 MAb R6G9. 

(D) Western blot of proteins from octylglucoside lysates of p6-trans- 
fected SW480 cells eluted from BSA- or LAP-Sepharose columns. 
Lane 1 was the final fraction eluted with column buffer, lanes 2-5 
were eluted with EDTA, and lane 6 was eluted with 8 M urea. Molecu- 
lar size markers (in kDa) are shown to the left. 


not mock-transfected, SW480 cells to either LAP or 
equimotar concentrations of either small or large latent 
TGFpl complexes containing LAP induced phosphory- 
lation of two downstream integrin-signaling intermedi- 
ates, the focal adhesion kinase (FAK) and paxillin {Figure 
2E). Phosphorylation of each protein was completely 
inhibited by addition of the av(56-blocking antibody 
10D5 (not shown). 

p6-Transfected Cells Induce TGFp Activity 
To determine if binding to avp6 activates TGFpl, we 
cocultured four different p6-expressing cells with mink 
lung epithelial reporter cells stably expressing a portion 
of the plasminogen activator inhibitor 1 promoter (TMLC) 
(Abe et al., 1994). For all four lines, coculture with p6- 
expressing cells caused a significant increase in lucifer- 
ase levels compared to coculture with control cells (Fig- 
ure 3A). These increases were abolished by MAbs 
against active TGFp or avp6. A different reporter cell line 
(NIH 3T3 cells transfected with PAI1/luciferase) yielded 
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Figure 2. Adhesion of (S6-Transfected Cells 
to LAP 

(A) Control and p6-transfected SW480 cells 
were stained with anti-avp6 mAB E7P6 (white 
peaks represent control cells; black peaks 
represent p6 transfectants) and analyzed by 
flow cytometry. Dotted lines represent 06- 
transfected cells incubated with PBS. 

(B and C) Nontransfected cells and (36-trans- 
fected cells were allowed to attach to wells 
coated with increasing concentrations of LAP 

(B) or LLC (C) or with 1 % B5A. Prior to plating, 
cells were incubated with or without anti- 
avp6 antibody 10D5. 

(D) Adhesion of SW480-(J6 cells to recombi- 
nant LAP containing a single glutamic acid 
for aspartic acid substitution mutation in the 
RGD site (RGE LAP) was compared with ad- 
hesion to authentic recombinant LAP. 

(E) Lysates of mock- and 06-transfected SW480 
cells plated for 30 min on poly-L-lysine (PLL) 
or on poly-L-lysine plus fibronectin (Fn), LAP, 
SLC, or LLC were immunoprecipitated with 
antibodies against FAK or paxillin followed 
by Western blotting with either anti-phospho- 
tyrosine antibody 4G10, anti-FAK, or anti- 
paxillin. 


similar results (data not shown). Three cell lines that are 
capable of adhering to immobilized LAP via avpi (293, 
MG63, and A549 cells) (Munger et al., 1998) did not 
activate TGFp in similar assays (Figure 3A and unpub- 
lished data). Antibodies against the integrin p1 subunit 
or the integrin avp5 had no effect on activation (not 
shown). 

To determine whether TGFp activation by avp6 re- 
quired cell-cell contact, we did coculture assays with 
inserts to separate reporter and p6-expressing cells by 
a few millimeters while allowing soluble molecules to 
pass. In the absence of contact, p6-expressing cells 
caused a slight induction of luciferase activity, but in- 
duction was minimal compared to (36-expressing cells in 
contact with the reporter cells (Figure 3B). These results 
indicate that the active TGFp generated by avp6 is most 
efficiently detected by cells in contact with the p6- 
expressing cells, but that at least a small amount of the 
active TGFp formed is freely diffusible. 

To determine if increased secretion of latent TGFp 
by p6-expressing lines could account for the results, 
serum-free medium conditioned by each line was tested 
for total TGFp activity. p6-transfected CHO cells se- 
creted more latent TGFp (8-fold) than did mock-trans- 
fected CHO cells. However, in the other three cell types, 
latent TGFp secretion was higher in the control lines 
(data not shown). All four lines secreted TGFpi as the 
predominant TGFp isoform. Cocultures in serum-free 
conditions yielded results essentially identical to those 


presented, so latent TGFp secreted by the cocultured 
cells is sufficient for measurable TGFp activation. To 
determine whether the observed active TGFp was spe- 
cifically TGFpi, we added isoform -specific neutralizing 
antibodies against TGFpi , p2, and p3 to cocultures con- 
taining p6-transfected SW480 cells. Anti-TGFpl blocked 
luciferase induction, whereas anti-TGFp2 and anti-TGFp3 
did not (Figure 3C). In addition, recombinant LAP (which 
both neutralizes TGFp in solution and binds the avp6 
integrin) blocked TGFp activation. 


avp6-Mediated Activation of TGFpi Does 
Not Require Other Known 
Activators of TGFp 

We next tested whether avp6-induced activation of 
TGFpi was occurring through previously described 
mechanisms of TGFp activation. Activation of TGFp by 
cocultures of endothelial cells and vascular smooth 
muscle requires plasmin (Sato et al., 1990), binding of 
mannose-6-phosphate on LAP (Dennis and Rifkin, 1991), 
and incorporation of LLC into the ECM via tissue trans- 
glutaminase (Kojima et al., 1993; Nunes et al., 1997). 
Therefore, we tested the effects of inhibitors that block 
each of these steps: the plasmin inhibitor aprotinin, M6P, 
inhibitors of transglutaminase-mediated cross-linking 
(cystamine and monodansylcadaverine), and a poly- 
clonal antibody against the N terminus of LTBP1 (Ab450) 
that blocks LTBP linkage to the ECM (Nunes et al., 1 997). 
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Figure 3. p6-Expressing Cells Activate TGFp 

(A) Equal numbers of reporter and test cells 
were cultured 16-20 hr and lysed for mea- 
surement of luciferase activity. Results are 
for four test cell lines. Additions are shown 
at the bottom (anti-TGFp: MAb 1D11, 10 pig/ 
ml; anti-36: MAb 10D5, 10 jxg/ml). Relative 
luciferase activity is the measured activity di- 
vided by the activity of the coculture with 
mock-transfected cells. Results are the mean 
(±SEM) of at least three experiments done in 
duplicate. 

(B) The effect of close proximity between re- 
porter and test cells on TGFp activation was 
determined using culture inserts. Equal num- 
bers of reporter cells in the bottom well, test 
cells in the bottom well, and test cells in the 
insert were cocultured for 16-20 hr and lucif- 
erase activity of the reporter cells was mea- 
sured. Relative luciferase activity is measured 
activity divided by the activity of reporter cells 
cultured with control cells in both the bottom 
well and the insert. Results are the means 
(±SEM) of triplicate measurements. 

(C) Activation of TGFp by p6-expressing cells 
does not require the activity of proteases or 
molecules involved in other systems of TGFp 
activation and involves only TGFp isoform 1. 
Mock- or p6-transfected SW480 cells were 
cocultured with reporter cells as described in 
(3A). Additions are indicated at the bottom. 
Data represent the mean (±SEM) of quadru- 
plicate measurements. 


Because other proteases can activate TGFp in vitro 
(Munger et al., 1998), we tested inhibitors of metallo-, 
aspartic, and cysteine proteases (BB94, BB2516, leu- 
peptin, pepstatin A. and E64). Finally, TSP1 -mediated 
activation of TGFpl can be blocked by MAb 133 (Schultz- 
Cherry et al., 1994). None of these inhibitors blocked 
the activation observed in cocultures with (36-express- 
ing cells (Figure 3C). 


Binding of LAP to «v(i6 Integrin Is Not Sufficient 
for Latent TGFfJI Activation 

To determine whether binding to avp6 is sufficient for 
activation of TGFp or whether additional interactions 
with cell components are required, we examined the 
effects of truncation mutations of the (36 subunit cyto- 
plasmic domain. Three mutants were examined; of 
these, only mutant 1111, which lacks the last 11 amino 
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Figure 4. Binding of LAP to av£6 Integrin Is 
Not Sufficient for Latent TGFp Activation 

(A) Schematic representation of the wild-type 
(36 integrin subunit (full-length p6) and the 
three truncation mutants studied. 

(B) Mock- or various fS 6- transfectants were 
incubated in suspension with 0.1 p.g/m1 LAP 
for 30 min at 37°C and analyzed by flow cy- 
tometry with anti-LAP MAb VB3A9. Relative 
fluorescence is the mean fluorescence of 
each sample divided by the mean fluores- 
cence of mock transfectants. Results are the 
mean (±SEM) of at least three experiments. 

(C) Equal numbers of reporter and test cells 
were cultured for 16-20 hr and luciferase ac- 
tivity was measured. For (C-E), relative lucif- 
erase activity is the measured activity divided 
by the activity of the TMLC alone and results 
are the mean (± SEM) of at least three experi- 
ments done in duplicate. 

(D) TMLC were cultured for 16-20 hr in the 
presence or absence of cytochalasin D (100 
fim), human recombinant TGFfJI (10 pM), or 
anti-TGFfi (MAb 1D11, 10 ng/ml). 

(E) Equal numbers of reporter cells and test 
cells were cultured for 16-20 hr in the pres- 
ence or absence of cytochalasin D (100 ^m), 
anti-av(S6 integrin (MAb 10D5, 50 |xg/ml), or 
anti-TGFp (MAb 1D11, 10 M^g/ml). 


acids of the p6 cytoplasmic domain, localizes to focal 
contacts (Cone et al„ 1994). To determine whether dele- 
tions in the p6 cytoplasmic domain would affect latent 
TGFpl binding, transfectants were incubated with LAP 
and analyzed by flow cytometry with antibody to LAP. 
More LAP was detected on the surface of cells express- 
ing wild-type av(36 than on mock transfectants (Figure 
4B). Mutant 747T showed no binding above background, 
but both 770T and 777T showed LAP binding similar to 
wild-type avp6. In coculture assays, cells expressing 
mutants 747T and 770T showed little or no activation 
of latent TGFp (Figure 4C). In contrast, mutant 777T 
activated latent TGFp. No consistent difference was de- 
tected in total TGFp secreted by these transfectants 
(data not shown). Since 770T bound LAP but failed to 
activate latent TGFp, binding of LAP by avp6 is not 
sufficient for activation of latent TGFp. 

To determine whether an intact cytoskeleton is re- 
quired for TGFp activation, we cocultured reporter cells 
with p6-transfected SW480 cells in the presence of 100 
u.m cytochalasin D, conditions under which 100 percent 
of the cells became round but remained adherent. Cyto- 
chalasin D did not inhibit LAP binding to the cell surface, 
nor did it affect surface expression of avp6 or TGFp 
secretion (not shown). Cytochalasin D had no effect on 


the ability of reporter cells to respond to active TGFp 
added to the culture medium (Figure 4D). However, cyto- 
chalasin D added to cocultures of p6 transfectants and 
reporter cells blocked activation of latent TGFp (Figure 
4E). The inhibition was similar to that achieved by anti- 
body to «vp6 or to TGFp. Thus, an intact cytoskeleton 
is required for avp6-mediated activation of TGFp. 

p<r'~ Mice Are Protected against Bleomycin-lnduced 
Pulmonary Fibrosis 

Pulmonary fibrosis was evaluated by examination of 
lung morphology and by measurement of hydroxypro- 
line content in p6 +/+ and p6~'~ 129 strain mice at 15, 30, 
and 60 days after intratracheal instillation of bleomycin. 
Fibrosis was significant in bleomycin-treated wild-type 
mice by 30 days and progressed to 60 days (Figure 
5A), whereas in p6~'~ mice, lung morphology remained 
nearly unaltered throughout the experiment, with only 
small patches of fibrosis (Figure 5B), and the lung hy- 
droxyproline content was not significantly different from 
that measured in saline-treated animals. Similar results 
were obtained in offspring of 129 by C57BI/6 inter- 
crosses (not shown). These results suggest that expres- 
sion of avp6 is required for pulmonary fibrosis in re- 
sponse to bleomycin. 
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Figure 5. p6~ /_ Mice Are Protected against Bteomycin-lnduced Pul- 
monary Fibrosis 

(A) Bleomycin (0.03 iu: blm) induces pulmonary fibrosis in (3£>" + but 
not $6~'~ mice, indicated by an elevation of lung hydroxyproline 
content compared with saline (sal)-treated controls 30 and 60 days 
after administration. Data from saline-treated mice at each time 
point are combined, as there was no significant difference between 
groups. Data are expressed as means (± SEM) of five to seven 
observations, *' p < 0.01. 

(B) Histology of low power sections (magnification 200 x) demon- 
strates dense accumulation of collagenous extracellular matrix in 
lungs of bleomycin-treated 36 +/+ but not 36"'" mice 60 days after 
injection. 

To determine whether the resistance of p6~'~ mice to 
bleomycin-induced lung injury and fibrosis was due to 
a blunted inflammatory response, we counted inflamma- 
tory cells obtained from bronchoalveolar lavage (BAL) or 
minced lungs from $6~ l ~ and $6* /+ mice after treatment 
with saline and 5 and 15 days after treatment with bleo- 
mycin. Bleomycin increased the total cell counts and 
the numbers of neutrophils, lymphocytes, and macro- 
phages in both lines of mice, but the effects were always 
greater in p6~'~ mice. These findings are consistent with 
our previous report of enhanced lung inflammation in 
P6~'~ mice and suggest that protection from bleomycin- 
induced pulmonary fibrosis is not due to inhibition of 
the inflammatory response to bleomycin. 

To determine whether the exaggerated inflammation 
and protection from injury and fibrosis in p6~'~ mice 
was due to impaired synthesis of TGFpi, we analyzed 
TGFp protein expression by immunohistochemistry and 
by the TMLC bioassay on eluates from lung slices that 
had been heated to 80°C for 20 min to release and 
activate TGFp. Specificity of the bioassay was con- 
firmed by >80% inhibition of all samples by anti-TGFpl 


antibody. TGFp eluted from lung slices was not different 
between lines or between saline and bleomycin treat- 
ment {relative luciferase activity compared to TMLC 
alone): saline, 6.5 ±1.6 (mean ± SEM) for p6 +/+ mice 
and 6.3 ± 1.4 for p5 _/_ mice; bleomycin, 5.5 ± 1.6 for 
P6 +/+ mice and 5.2 ± 1.1 for P6 - '" mice. Furthermore, 
immunohistochemistry with an antibody against a 30- 
amino-acid C-terminal peptide of TGFpi (LC1-30) under 
conditions reported to detect both active and inactive 
TGFp (Barcellos-Hoff et al., 1995) revealed the presence 
of TGFp throughout the lungs and airways in both saline- 
and bleomycin-treated animals, with no detectable in- 
crease at any time point after bleomycin treatment in 
either line of mice. Immunohistochemistry under condi- 
tions reported to detect only active TGFp demonstrated 
little staining at any time point (data not shown). 

avp6 Protein Expression Is Focally Induced 
by Bleomycin in p6^ /+ Mice 

We have previously reported that avp6 is expressed at 
low levels in skin and lung epithelium, but that expres- 
sion is dramatically upregulated in cutaneous wounds 
and in injured and inflamed epithelia (Breuss et al., 1 995). 
To determine whether bleomycin treatment produced 
similar increases in avp6 expression, we performed im- 
munohistochemistry on lung sections from p6 +/+ mice 
10 days after treatment with either saline or bleomycin. 
As expected, no avp6 immunoreactivity was seen in 
lungs from $6~'~ mice. Diffuse, low-level expression of 
avp6 was apparent in airway and alveolar epithelial cells 
in p6 +/+ mice treated with saline, whereas focal areas 
with markedly increased expression of avp6 were pres- 
ent throughout the lungs of bleomycin-treated animals 
(Figure 6A). 

Keratinocytes and Airway Epithelial Cells 
Activate TGFpi through av($6 

To determine directly whether avp6 expressed in mouse 
skin could activate TGFpi, we performed bioassays by 
coculturing keratinocytes obtained from p6~'~ or p6 +/+ 
mice with TMLC (Figure 6C). p6 +/+ cells expressed abun- 
dant amounts of «vp6 (Figure 6B) and demonstrated 
TGFpi activity in this assay (Figure 6C), whereas p6~'~ 
cells did not induce TGFpi activity. Because of the diffi- 
culty of culturing murine lung epithelial cells, we per- 
formed similar studies using primary cultures of human 
bronchial epithelial cells, which also demonstrated sig- 
nificant expression of otvp6 (Figure 6B). These cells also 
induced avp6-dependent TGFpi activity. 

Discussion 

In this report, we show that LAP-p1 is a ligand for the 
integrin avp6 and that avp6-expressing cell lines can 
activate endogenous latent TGFpi. Furthermore, p*> w ~ 
mice are protected from bleomycin-induced pulmonary 
fibrosis, a model that has been shown to be critically 
dependent on TGFp activity. In the mice we studied, 
TGFpi was constitutively expressed in the lungs, and 
the amount of total TGFp protein was not demonstrably 
different in p6~'~ and p6 +/+ mice and was not signifi- 
cantly affected by treatment with bleomycin. However, 
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Figure 6. Bleomycin Focally Increases avf}6 Expression in {36 +/+ 
Mice, and Keratinocytes and Airway Epithelial Cells Expressing av[36 
Activate TGFpl 

(A) Representative sections from lungs of $6+' + mice 10 days after 
treatment with saline or bleomycin. Arrows, normal staining of con- 
ducting airway epithelium; arrowheads, alveolar epithelial cells with 
dramatically increased avp6 expression, 

(B) Primary cultures of p6 tH and fJ6~'~ keratinocytes or human 
bronchial epithelial cells (HBE) were stained with anti-av{56 antibod- 
ies 10D5 or E7P6 (shaded histograms) or PBS and analyzed by flow 
cytometry. 

(C) p6 + '~ and 06~'~ keratinocytes or HBE were cocultured with 
TMLC for 1 6-20 hr in the presence or absence of anti-avp6 antibody 
10D5 or anti-TGFpl antibody (101). Relative luciferase activity is 
measured activity divided by activity of TMLC alone. Data are the 
mean (*SEM) of at least four measurements. 


in response to bleomycin, avp6 expression was dramati- 
cally increased in the lungs of p6 +/+ mice. Together with 
the observations that p6~'~ mice develop inflammation 
in the skin and lungs (partially reproducing the TGFpl 
knockout phenotype), the results of this study indicate 
that the regulated expression of avp6 by epithelia is 
important for local activation of TGFpl in response to 
injury and inflammation. This idea is consistent with a 
model in which tissue injury induces avp6 expression, 


which in turn locally activates TGFpl already abundantly 
present in many tissues. TGFpl, once activated, en- 
hances matrix deposition (healing or fibrosis) and down- 
regulates the inflammatory response to injury. 

This feedback model highlights the fact that resolution 
of inflammation is an active process. There are two regu- 
latory pathways that might allow rapid amplification of 
this antiinflammatory feedback system. First, TGFp itself 
induces p6 integrin subunit expression (Sheppard etal., 
1992; Wang et al., 1996). Second, TGFpl induces its 
own expression (Van Obberghen-Schiliing et al., 1988). 
Presumably, mechanisms exist to reverse these positive 
feedback effects; these mechanisms may fail in patho- 
logic states of persistent TGFp activity and fibrosis that 
involve epithelia. 

The p6 knockout mice develop inflammation only in 
skin and lung and not in other tissues where p6 is ex- 
pressed (e.g., uterus, renal epithelium, urinary bladder). 
This selectivity could be a consequence of the unique 
susceptibility of the skin and lung to environmental in- 
sults, leading to subclinical inflammation that must be 
actively repressed. For example, skin involvement oc- 
curs in areas most exposed to physical trauma, and 
lung inflammation in $6~'~ mice is worse when mice are 
housed in unventilated cages. However, mice express- 
ing a null mutation in the TGFpl gene develop exagger- 
ated inflammation in multiple organs (Shull et al., 1992). 
In addition, in keeping with the known effects of TGFpi 
in inhibiting proliferation of epithelial cells, these mice 
demonstrate increased mitoses and epithelial hyperpla- 
sia in multiple epithelial organs. Despite careful morpho- 
logic examination of the liver, pancreas, bladder, stom- 
ach, uterus, and intestine, we have been unable to 
identify any of these abnormalities in p6*~'~ mice. To- 
gether, these data demonstrate that binding to avp6 is 
not the principal mechanism of TGFpi activation in most 
organs and that the developmental effects of TGFpl do 
not require activation by interaction with this integrin. 
Whereas other activation mechanisms are involved in 
developmental effects of TGFpi, interaction with avp6 
appears to be important for locally titrating the aug- 
mented TGFpl activity required in response to injury, 
at least in the lungs and skin. 

Previous studies of TGFp activation suggested the 
critical involvement of proteases, particularly plasmin. 
Our results, along with other recent work, suggest that 
nonproteolytic mechanisms are important physiologic 
pathways leading to active TGFp. TSP1 -mediated acti- 
vation occurs when TSP1 binds LAP-p1 at a site near its 
N terminus. This presumably induces a conformational 
change that activates the complex, although the active 
TGFpi molecule remains bound to TSP1. avp6 binds 
LAP at the RGD site located near the C terminus. The 
nonproteolytic mechanisms must rely on an intrinsic 
ability of LAP to adopt different conformations. Confor- 
mational flexibility of LAP has already been documented 
in circular dichroism studies that showed recombinant 
free LAP undergoing a major conformational change 
upon binding TGFp in solution (McMahon et al., 1996). 

While avp6 expression is clearly necessary for the 
TGFp activation mechanism we describe, is it sufficient? 
If Qtvp6 is not sufficient, its role might simply be to con- 
centrate latent TGFp at the cell surface, thereby permit- 
ting some separate mechanism to activate TGFp. In 
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Figure 7. Model of TGFpi Activation by av|36 
The data presented suggest that when latent TGFpl complexes 
bind to avp6, sites in the 36 cytoplasmic domain become accessible 
for binding to the actin cytoskeleton. Cytoskeleton-associated inte- 
grin then induces a change in the conformation of the latent com- 
plex, allowing access of mature TGFpl to TGFp receptors and induc- 
tion of classic TGFp signaling. 


attempting to answer this question, we tested whether 
molecules or processes known to be involved in other 
systems of TGFp activation are required. Previous stud- 
ies of endothelial and smooth muscle cell cocultures, 
which activate latent TGFp, suggested that plasmin, 
the IGF-II/M6PR, and transglutaminase- mediated cross- 
linking of latent TGFp to the ECM are all required for 
activation. However, our results indicate that none of 
these molecules or processes, nor TSP1 or a wide range 
of proteases, is involved in avp6-mediated activation of 
TGFp. In addition, the fact that we observe activation 
using six different avp6-expressing cell lines and two 
reporter cell lines suggests that any additional mole- 
cules that are required must be widely expressed. 

We have identified one additional requirement for acti- 
vation: the ability of avp6 to connect with the actin cy- 
toskeleton. Cells expressing mutant p6 were able to 
activate TGFp only when the mutant integrin could local- 
ize to focal contacts, a process that involves clustering 
and mechanical linkage of integrins to the actin cytoskel- 
eton in complexes containing an array of adapter pro- 
teins that includes FAK and paxillin. Cells expressing 
p6 mutants that do not localize to focal contacts do 
not activate latent TGFp, even though one of these p6 
mutants (770T) is still able to bind LAP via avp6. Cyto- 
chalasin D, which disrupts actin filaments, blocked 
TGFp activation by cells expressing avp6. These results 
suggest that binding of latent TGFp to avp6 per se is 
not sufficient for activation to occur; following binding, 
avp6 must also associate with the actin cytoskeleton in 
order to activate bound latent TGFp (see model, Figure 
7), Thus, modulation of cytoskeleton/avp6 interactions 
might be a means to regulate TGFpi activation indepen- 
dent of changes in avp6 expression. 

Of the integrins known to bind RGD sequences, three 
are now known to bind to LAP-pi (avpl, avp6, and, 
weakly, avp5), and one, the platelet integrin allbp3, may 
(Grainger et al., 1995). The main functions heretofore 
ascribed to LAP are TGFp latency and the facilitation 
of TGFp secretion (Gray and Mason, 1990). The finding 


that multiple integrins can bind TGFpi -LAP raises the 
possibility of an additional function, the ability to initiate 
signaling via integrins. The results of the present study 
demonstrate that LAP-containing latent TGFpi com- 
plexes can induce phosphorylation of at least two com- 
ponents of integrin-signaling complexes, FAK and paxil- 
lin. This finding raises the possibility that these "latent" 
complexes could initiate integrin-mediated effects on 
cell behavior. 

The observation that avp6 induces TGFpi activity also 
suggests an alternative mechanism by which at least 
one integrin can affect cell behavior—by activating ex- 
tracellular TGFpi that, in turn, initiates responses by 
binding to its own cognate receptor(s). This mechanism 
appears to explain the exaggerated lung and skin inflam- 
mation and protection from pulmonary fibrosis in p6*~ /_ 
mice and suggests the possibility of regulating local 
inflammation and fibrosis by targeting this integrin. 

Experimental Procedures 

Cell Lines, Antibodies, and Reagents 

Cell lines were obtained from American Type Culture Collection and 
transfected with integrin expression plasmids as described (Wein- 
acker et al., 1994). Mink lung epithelial cells stably transfected with 
a plasmid containing the luciferase cDNA downstream of a TGFp- 
sensitive portion of the plasminogen activator inhibitor 1 promoter 
(TMLC) were used as described (Abe et al.. 1994). Mouse anti-avp6 
MAbs E7P6, R6G9 (Weinacker et al., 1994), and 10D5 (Huang et al., 
1998a), rabbit anti-p6 MAbs 4B5 and B1 (Huang et al., 1998b), and 
mouse MAb VB3A9 against TGFpi LAP (Munger et al., 1998) were 
produced as described. Mouse anti-phosphotyrosine MAb 4G10 
was obtained from Upstate Biotechnology, Inc. (Lake Placid, NY); 
mouse MAbs against FAK and paxillin were obtained from Transduc- 
tion Laboratories (Lexington, KY); and rabbit polyclonal antibodies 
against FAK were obtained from Santa Cruz Biotechnology (Santa 
Cruz, CA). LAP and LAP (RGE) were produced in a baculovirus 
system as described (Munger et al., 1998). Recombinant SLC and 
LLC were gifts of Drs. H. Ohashi and H. Tsumura (Kirin Brewery Co., 
Gunma, Japan). MAb 1D11 against active TGFp (all isoforms), anti- 
TGF01 polyclonal chicken Ig (AF-101-NA), anti-TGFp2 polyclonal 
goat IgG (AB-112-NA), and anti-TGFp3 polyclonal goat IgG (AB- 
244-NA) were from R and D Systems, Minneapolis, MN. Anti-TSPI 
MAb 133 (Schultz-Cherry and Murphy-Ullrich, 1993) was a gift of 
Dr. Murphy-Ullrich (University of Alabama, Birmingham). Rabbit 
polyclonal antiserum LC1 -30 against a C-terminal peptide of TGFpi 
was a gift of Kathy Flanders (National Cancer Institute, Bethesda, 
MD). Anti-LTBP1 polyclonal rabbit antibody 450 was produced as 
described (Nunes, et al., 1997). Other reagents were all analytical 
grade. 

Cell lines were grown in Dulbecco's modified Eagle's medium 
(DM EM) with 4.5 g/l glucose, L-glutamine, and 10% fetal bovine 
serum. Murine keratinocytes were obtained and grown as previously 
described (Huang et al., 1996). Human bronchial epithelial cells were 
purchased from Clonetics, grown in serum-free bronchial epithelial 
cell medium (Clonetics), and used at passage 1. 

Affinity Chromatography 

LAP, BSA, and chymotrypsin-digested fibronectin were coupled to 
cyanogen bromide-activated Sepharose essentially as described 
(Pytela et a I., 1985). Affinity matrices contained 2.5 mg/ml of fibro- 
nectin, 4 mg/ml of BSA, and 7 mg/ml of LAP. Columns were washed 
and blocked with 1% BSA. Secreted avp6 was produced as de- 
scribed (Weinacker et al., 1994). Culture medium was passed 
through affinity columns, and bound proteins were eluted with col- 
umn buffer, then with 20 mM EDTA in 50 mM Tris and 150 mM NaCI, 
and finally with 8 M urea. Octylglucoside tysates of (56- transfected 
SW480 cells were used for affinity chromatography under the same 
conditions with addition of 25 mM octylglucoside. 
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Immunoprecipitation 

Samples were incubated with antibodies for 3 hr at 4°C. Immune 
complexes were collected by incubation for 1.5 hr with protein 
G-Sepharose. Beads were washed three times, boiled for 3 m in in 
Laemli sample buffer, and then analyzed by SDS-PAGE and autora- 
diography. 

Western Blotting 

Proteins were separated by SDS-PAGE, transferred to a nylon mem- 
brane, and blocked for 1 hr in Tris- buffered saline containing 3% 
BSA or 5% skim milk. After incubation with primary antibody for 3 
hr and then with peroxidase-conjugated secondary antibody for 1 
hr, blots were developed with ECL (Amersham). 

Cell Adhesion Assays 

The assays were performed as previously described (Busk et al., 
1992). Untreated polystyrene 96-well flat- bottom microliter plates 
(Flow Laboratories, McLean, VA) were coated with LAP or 1% BSA. 
Cells were plated at 50,000 cells/well, and plates were centrifuged 
(top side up) at 10 g for 5 min and then incubated for 1 hr at 37X. 
Nonadherent cells were removed by centrifugation, and attached 
cells were fixed, stained, and lysed with 50 nl of 2% Triton and 
quantified by measuring absorbance at 595 nm. 

Flow Cytometry 

Cells were blocked with normal goat serum, washed with PBS, and 
incubated with primary antibody for 20 min and then with phycoer- 
ythrin-conjugated secondary antibody (Boehringer Mannheim) for 
20 min at 4°C. Cells were resuspended with PBS and analyzed by 
FACScan (Becton Dickinson, Rutherford, NJ). 

TGFp Bioassay 

TMLC and test cells were suspended at 5 x 10 s cells/ml in DMEM 
containing 10% FCS. TMLC were plated first at 50 jjlI per microtiter 
well (Microtest III plates, Falcon, Franklin Lakes, NJ) and allowed 
to attach for 1 hr. Keratinocytes and bronchial epithelial cells were 
suspended at 4-fold higher density. Medium was replaced with 50 
^I/well of the same medium with or without additions (e.g., antibod- 
ies). Fifty microliters of test cell suspension or test solutions was 
added and plates were cultured for 1 6-20 hr. Lysates were assayed 
for luciferase activity as described (Abe et al., 1994). Similar cocul- 
tures were done in 24-well plates (Costar model 3526, Corning, 
NY) with inserts designed for attachment-dependent cell culture 
(Millicell-PCF 3 M-m filter, Millipore, Bedford, MA), but 300 \l\ of 
reporter and test cells were added to upper and/or lower chambers. 
To elute TGFp from lung slices, tissues were quick frozen in liquid 
nitrogen, and five 20 jjim cryosections were incubated for 20 min in 
500 jil of DMEM at 80°C. 

Bleomycin Treatment 

Age- and sex-matched 8- to 12-week-old 06 +/+ and {36 w ~ mice of 
strains 129/terSVEMS and 129/terSVEMS by C57BI/6 were main- 
tained in a specific pathogen -free environment. Bleomycin (Mead 
Johnson, Princeton, N J) was dissolved in sterile saline (0.03 or 0.05 
units in 60 jil). Bleomycin or saline was administered transtracheal^ 
under methoxyflurane anesthesia by direct cut down. 

Hydroxyproline Assay 

Hydroxyproline content was measured in whole mouse lungs by 
methods previously described with modifications (Woessner, 1961). 
Following perfusion with PBS and homogenization, samples were 
incubated on ice in tricarboxylic acid (50%; Sigma Chemical Co., 
St. Louis, MO) and baked in 12 N hydrochloric acid (Mallinckrodt 
Baker Inc.. Paris, KY) for 24 hr at 110°C. Aliquots reconstituted with 
distilled water were added to 1.4% chloramine T (Sigma) in 10% 
isopropanol and 0.5 M sodium acetate for 20 min. Eriich's solution 
(Sigma) was added and incubated at 65°C for 15 min. Absorbance 
was measured at 550 nm. 

Histology and Immunohistochemistry 

The trachea and both lungs were fixed by inflation at 25 cm H z 0 
with 10% formalin and embedded in paraffin (for histology and total 
TGFp staining with antibody LC1-30) or inflated with 50% OCT and 


quick frozen in liquid nitrogen. Five micrometer sections were 
stained with hematoxylin and eosin and with irichrome to identify 
extracellular collagen. Sections were fixed in cold acetone for 06 
antibody (B1) or in methanol/acetone for "active" TGFp staining with 
antibody (LC1-30). Sections were blocked with Peroxoblock (Zymed 
Lab) and Avidin/Biotin Blocking Kit (Vector) and rinsed and incu- 
bated with 3% goat serum in PBS for 1 5 min and then overnight at 
4°C in primary antibody. Sections were incubated in biotin-labeled 
secondary antibody for 1 hr and in ABC avidin/peroxidase reagent 
(Vector Lab) for 1 hr at room temperature, and chromagen was 
developed using the DAB Plus Kit (Zymed). 
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MS# M3-3813 
Forsberg and Johansson 

This manuscript describee the preliminary and Initial purification of another member of the 
Integrln super family, which was named here alphadbetal (cflB1). The authors also 
characterize some of the peptlde/protein binding properties of a9pi and they use other rat 
tissuss to attempt to localize its occurrence. While the authors may have purified a new 
protein, there are some problems associated with their observations andl have attempted 
to summarize my major criticisms below (I feel that the authors have done some Interesting 
preliminary data on o9B1 yet i also feel that their results are too preliminary and the data Is 
not strongly supported by quantitative observations) : 

1 . The description of the protocol used was minimally described, for instance, what 
temperature was used throughout the homogenization?, whatiampairjAjreand Operate 
were employed for the column ch r^atoflraonx? . what njatrixvoIuiiiB* were used for 
these preparation of Insolublllzed-ligands?, what were the amounts of proteins and 
peptl&6<eoupJad to these matrices?, what soil of 'MntmLroatrix' was used to measure 
non-specffle adsorption of these Integrins past theWGA-Sepharose matrix? (this is 
especially important due to the lack of any apparent specificity of a9Q1 to Interact with 
immobilized RGD-contalnlng peptides and the modified derivatives found in this work). 

2. What is the statue of the cysteine sulihvdryl group used in the peptide and the 
matrix? What It effectively deprotected following its synthesis to liberate the sulfhydryl 
moiety? Is It cflsulfide-Hnked to another peptide? 

3. What per^X^iy^ralo^flel was used In these studies? 

4. What la tha spadfln gctMty of the 1gg|-lfl|M»tari Intanrin uywri in ihasa h irilas? ft 
would be impossible to reproduce this Information described In the paper and to quantltate 
the binding data shown without such information. 

, 5. The gel shown In fi gure 2 has been c ut^rtvery close to the Mr 90kDa marker, and it 
is not possible to assess wnat protein Is visualized below this marker.. .inspection of Figure 
1 A reveals numerous lower molecular weight species in the silver stained gel... are these 
also present in Figure 2? 

6. The sequencing data reported In the manuscript gives no data regarding the yield of 
eachsycje sequenced for the alpha chain of your Integrln. This Is important InformatlwvouT 
tSThe routine difficulty of many labs In sequencing directly from blots as you have shown in 
your manuscript. Did you also sequence the te tSLChaliielnce you had the purified protein 
on a blot? did it correspond to the exact sequence found for betal ? 

7. Since you state that the conservation of integrln homology is normally >80% 
throughout various species, why didnt you analyze some hyjsa&Jisju^^ 

lines for the existence of a9jJ1 1 One would venture thatl would be simpler and more 
Informative than the procedure used here of analyzing other rat omans...thls Is especially 
important since you already have a polyclonal antibody to the amino-terminus of your 
integrln, and this would allow you to quantltate your observations, and not rely on the 
destruction of organs and potential low recovery of your and other integrins. It would also 
be much informative to perform specific tissue histoohemlcal analyses to directly locate 
a9pl . — — — 
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6. One could also speculate that your intogrfn alpha chain Is not new. but a product of 
alternative splicing which is quite prevalent (It teems) In the Inteqrlns and extracellular 
maMx/bytosHeletal proteins....le It possible that your alpha chain is Just a variant of an 
*L existing alpha chain product? One could easily probe some Ivor libraries to detect the 
oONA and obtain full-length alpha chain sequence, especially since you already have an 
existing specific polyclonal antibody. 

9. The peptide specificity of o901 is intriguing, yet the preliminary observations need 
to be expanded to better understand these studies, The dilemma of why ct9pl would 
elutt off an RGD-matrix, interact with immobilized laminin and then not be competed-off 
again from this surface by RGD-ccntalnlng peptides le quite Interesting yet the description 
is too Qualitative. Regarding the studies of Figure 6, is the difference seen In binding due to 
a specific Interaction of odftf with the various proteins or Is K due to a difference In the actual 
amount* af Uganda hound to the polystyrene surface? What would a BSA-cpntrol goat ed 
piate oo In such a binding assay? It would most interesting to rigorously etuay tne wrong 
properties of aS|J1 with various RGD-peptldes, and highly defined |ffjgffftntB Of '■" llnln - 
ana flbronecrJn, these should all be part of the Initial description of a9pl for publication. 
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Dear Dr. ffill, 

Thank you for the rapid handling of our manuscript no M 3r3813 
Purification and characterization of integrin a9pi" by Forsberg et al. We 
have now revised the paper as suggested by the reviewer. We found his 
comments to be fair and helpful for improvement of the paper. In addition 
to further technical information, new experimental work have been added 
to the manuscript. Enclosed in this letter is a package including one copy 
of the previous version with xerox copies of the figures, and three copies of 
the revised version containing full sets of original pictures. 

Our response to the points raised by the referee is listed below. 

1. The temperature under which the purification was performed, flow 
rates during affinity chromatography, the amount of proteins and peptides 
coupled to Sepharose, and bed volume of columns have been included in 
'Materials and Methods" (p. 9). 

In the paper, the integrin retained on GRGDSPC-Sepharose is shown not 
to bind to Sepharose conjugated with GRGDS, GRGDSP, the 105 kD 
fibronectin fragment, collagen, or laminin. Further, none of the integrins 
that we have studied bind to BSA-, IgG-, or thiol-Sepharose (not shown in 
the paper). 

2. The sulfhydryl groups in the peptides used in this manuscript were all 
deprotected as described in the reference given in "Materials and 
Methods". To investigate the status of the sulfhydryl groups, two sets of 
experiments were performed, which are described in the manuskript on 
pages 7-8. The results demonstrate that the peptides are present as 
monomers on the gel matrix. Furthermore, DTT was applied to the 
GRGDSPC-Sepharose after the EDTA elution of integrin, and the material 
released by the reducing agent was analyzed by silver-staining after SDS- 
PAGE. No proteins could be detected in the DTT fractions. This result 
indicates that a9pl binds directly to the peptide, rather than to a thiol- 
binding protein in the WGA-pool (p. 15). 

3. SDS-PAGE was performed in 7% acrylamide gels, which is now 
mentioned in "Material and Methods'Xp. 8). 

4. The specific activity of 125 I labeled a9pi used in the solid phase receptor 
assay (-10 000 cpm/ng protein) has been added to the legend of figure 7 (p. 
27). 
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5. The small cut off picture was chosen since pictures of SDS-gels of our 
preparations of otSpl and al(li have been published previously (Johansson 
et aL 1987, J. Biol. Chem. 262, 7819-7824; Forsberg et al. 1990, J. Biol. 
Chem. 265, 6376-6381); In the revised manuscript this picture has been 
replaced with a full-size photo of a new SDS-PAGE. In lanes 3 and 4 a 
protein of 70/75 kD (non-reduced/reduced) is seen in addition to avp3. This 
protein is obtained in variable amounts and is derived from the blood that 
remains in the tissue after it has been cut into pieces and washed 
(mentioned on p. 13). 

6. For all amino acid sequences shown in the paper, the repetitive yield 
was 92-95% for each reaction (p. 10). 

We have now obtained partial amino acid sequences of the ct9-associated (J- 
unit (pi), and of the av-associated p-unit (p3). These sequences are 
presented in fig. 4. 

7. The peptide-antibody is working in immuno-precipitation and in 
western blots Of rat a9, but unfortunately it is too weak for immuno- 
histochemistry: Also, the reactivity with human cells is poor. Therefore we 
have to await the generation of better reagents for the determination of 
the detailed location of q9. So far, we only know that it is a widespread 
integrin, which previously have been overlooked, and in some cases may 
have been taken as a2. 

8. We have now obtained an amino acid sequence from an internal peptide 
of a9. This sequence is homologous to other a-subunits, and is probably 
located close to the middle of the protein in a region where the different a- 
subunits are well conserved (presented as fig. 3B). Since a9 has unique 
sequences at two distantly located sites, it is unlikely to represent a splice 
variant of the previously identified integrin subunits. 

We are currently trying to obtain cDNA clones of a9 by screening of 
libraries and by PGR. This work is not always straight forward, and we 
have not been successful yet. 

9. The coating efficiency of the proteins and fragments used in the solid 
phase assay have previously been found not to differ significantly except 
for collagens, which is adsorbed less efficiently at low concentrations 
(Timpl, Johansson, van Delden, Oberaumer & Hook 1983, J. Biol. Chem. 
258, 8922-8927; Perris & Johansson 1987, J. Cell Biol. 105, 2511-2521; 
Perris, Paulsson & Bronner-Fraser 1989, Dev. Biol. 136, 222-238; Perris, 
Krotoski & Bronner-Fraser 1991, Development 113, 969-984). 
Nevertheless, the dishes coated with collagen were good substrates for cell 
adhesion and binding of 125 I-labeled a9pi. 

BSA-coated wells do not bind the integrin <x9pl and are used as a negative 
control (now mentioned in the legend to fig. 7). 
In order to perform more detailed competition experiments of ligand- 
binding to the receptor, we tried to coat the wells with the RGDSPC 
peptide coupled to BSA or ovalbumin via glutaraldehyde. Unfortunately, 
this method did not allow for detection of a9[M -binding to the peptide. 
However, we were able to locate regions in EHS-laminin and collagen I to 
which ct9f$i binds (fibronectin is not a ligand for this integrin, fig. 7A). Of 
the fragments CB3, CB7, and CB8, which comprise -50% of the collagen 
ctl(I) molecule, a9(Jl was found to bind only to the CB8 fragment. Two large 
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Chem. 265, 6376-6381). In the revised manuscript this picture has been 
replaced with a full-size photo of a new SDS-PAGE. In lanes 3 and 4 a 
protein of 70/75 kD (non-reduced/reduced) is seen in addition to avp3. This 
protein is obtained in variable amounts and is derived from the blood that 
remains in the tissue after it has been cut into pieces and washed 
(mentioned on p. 13). 

6. For all amino acid sequences shown in the paper, the repetitive yield 
was 92-95% for each reaction (p. 10). 

We have now obtained partial amino acid sequences of the a9-associated p- 
unit (pi), and of the av-associated p-unit ((13). These sequences are 
presented in fig. 4. 

7. The peptide-antibody is working in immuno-precipitation and in 
western blots of rat a9, but unfortunately it is too weak for immuno- 
histochemistry. Also, the reactivity with human cells is poor. Therefore we 
have to await the generation of better reagents for the determination of 
the detailed location of a9. So far, we only know that it is a widespread 
integrin, which previously have been overlooked, and in some cases may 
have been taken as a2. 

8. We have now obtained an amino acid sequence from an internal peptide 
of a9. This sequence is homologous to other a-subunits, and is probably 
located close to the middle of the protein in a region where the different <x- 
subunits are well conserved (presented as fig. 3B). Since ct9 has unique 
sequences at two distantly located sites, it is unlikely to represent a splice 
variant of the previously identified integrin subunits. 

We are currently trying to obtain cDNA clones of ct9 by screening of 
libraries and by PCR. This work is not always straight forward, and we 
have not been successful yet. 

9. The coating efficiency of the proteins and fragments used in the solid 
phase assay have previously been found not to differ significantly except 
for collagens, which is adsorbed less efficiently at low concentrations 
(Timpl, Johansson, van Del den, Oberoumer & Hook 1983, J. Biol. Chem. 
258, 8922-8927; Perns & Johansson 1987, J, Cell Biol. 105, 2511-2521; 
Perris, Paulsson & Bronner-Fraser 1989, Dev. Biol. 136, 222-238; Perris, 
Krotoski & Bronner-Fraser 1991, Development 113, 969-984). 
Nevertheless, the dishes coated with collagen were good substrates for cell 
adhesion and binding of 125 I-labeled a9pi. 

BSA-coated wells do not bind the integrin a9(Ji and are used as a negative 
control (now mentioned in the legend to fig. 7). 

In order to perform more detailed competition experiments of ligand- 
binding to the receptor, we tried to coat the wells with the RGDSPC 
peptide coupled to BSA or ovalbumin via glutaraldehyde. Unfortunately, 
this method did not allow for detection of a9pi-binding to the peptide. 
However, we were able to locate regions in EHS-laminin and collagen I to 
which <x9pi binds (fibronectin is not a ligand for this integrin, fig. 7A). Of 
the fragments CB3, CB7, and CB8, which comprise -50% of the collagen 
al(I) molecule, a9pi was found to bind only to the CB8 fragment. Two large 


fragments of EHS-lamiBin, E8 and PI, were also tested in this assay. a9(Ji 
bound specifically to E8 while no binding to PI occurred. The three 
collagen fragments and the two laminin fragments were all good 
substrates for hepatocyte adhesion, further indicating that major 
differences in coating efficiency are not the reason for the different binding 
of 125 I-labeled ct9pl. These data are presented in Figures 7B and 8. 
The binding of the integrin to EHS-laminin and to collagen I was specific 
in the sense that it was inhibited (60%) by a 50 fold excess of unlabeled 
a9pi. Further, the binding to collagen I was reduced by 70% if the protein 
was incubated at 50 C° for 10 minutes, and then coated at 37 C°. This is 
now mentioned in "Results" (p. 16). 

We hope that these additions to the manuscript make it acceptable 
for publication in the Journal of Biological Chemistry. Please send 
editorial corresponence to the undersigned person, preferably by fax. 

Yours sincerely 


Erik Forsberg 
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Dr. E. Forsberg 

Dept. of Medical and Physlologloal Chemistry 

Univ. of Uppsala 

The Biomedical Center, Box 575 

S-751 23 Uppsala, SWEDEN 

Dear Dr. Forsberg, 

Your manuscript with Dre. Ek, Engatrom and Johansson entitled, "Purification and 
Characterization of Integrin cs9pT, has been reviewed by the reviewer who read 
the original version. I am sorry to Inform you that he did not find the additional 
ohanges in the text sufficiently persuasive to change the recommendation to 
decline the manuscript. He did acknowledge that the additional description of the 
methodology and the new data improved the m anuscript, yet also felt that more 
.d efinitive st u d i es characterizing this new Integrin and its adhesion prop erties would 
be necessary. While most of the major methodology criticisms of the original 
orltique> were answered, it should also be mentioned that within the rapidly 
changing and emerging field of Integrin biochemistry, It was felt that your studies 
did not permit a full enough-cojmpjJMpn with other known integrlns. Furthermore, 
the competition data with the peptides and protein fragments were Interesting, yet It 
also requlresjyftbei scientific development. The reviewers felt that you did 
substantially Improve the manuscript, yet the original major concerns were not fully 
answered in your revised manuscript, in view of these negative opinions, i must 
decline your manuscript for publication in the Journal. I regret the need for a 
negative decision, but thank you for submitting the manusoript to the Journal. 

I am not returning the manuscript unless you ask but please find enclosed the 
artwork. 

Sincerely yours, 

Robert L Hill 
Associate Editor 

RLH/drm 


